### Get data from the APIs

-  Use `Requests` module to connect to API and fetch response.
-  Use `JSON.loads()` to convert a JSON object to python dictionary. 

-  Why to use APIs to collecte the data
    -  When the data is being updated real time. If you use downloaded CSV file. yuo have to download the data manually and update the analysis multiple times. using APLs we can automate the analysis process.
    -  Easy access to structured and verified data
    - Access to restricted data

#### Google map geocoding API

-  Join the words in the address using plus and convert it to a form `word+in+the+address`
-  Connect to the URL by appending the address and API keys
-  Get response from the API and convert it into a python object(a dictionries)

In [1]:
import numpy as np
import pandas as pd

import requests, json
import pprint

# join the word in the address by a "+"
add = "UpGrad, Nishuvi building, Anne Besant Road, Worli, Mumbai"
split_add = add.split(" ")
address = "+".join(split_add)
print(address)

UpGrad,+Nishuvi+building,+Anne+Besant+Road,+Worli,+Mumbai


Now we can connect the Google map URL using the API keys and the address and get a response. Like most APIs google map return the geocode data in JSON format, kind of similar to the python dict.

In [3]:
api_key = "AIzaSyBXrK8md7uaOcpRpaluEGZAtdXS4pcI5xo"

url = "https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}".format(address, api_key)
r = requests.get(url)
print(r.text)

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "Nishuvi",
               "short_name" : "Nishuvi",
               "types" : [ "premise" ]
            },
            {
               "long_name" : "75",
               "short_name" : "75",
               "types" : [ "street_number" ]
            },
            {
               "long_name" : "Doctor Annie Besant Road",
               "short_name" : "Dr Annie Besant Rd",
               "types" : [ "route" ]
            },
            {
               "long_name" : "Worli",
               "short_name" : "Worli",
               "types" : [ "political", "sublocality", "sublocality_level_1" ]
            },
            {
               "long_name" : "Mumbai",
               "short_name" : "Mumbai",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Maharashtra",
               "short_name" : "MH",
               "types" : [ "ad

-  Above data is in JSON formatt so we can easily convert the JOSN into the python dict using `json.loads(json_objetc)`

In [7]:
r_dict = json.loads(r.text)
pprint.pprint(r_dict)

{'results': [{'address_components': [{'long_name': 'Nishuvi',
                                      'short_name': 'Nishuvi',
                                      'types': ['premise']},
                                     {'long_name': '75',
                                      'short_name': '75',
                                      'types': ['street_number']},
                                     {'long_name': 'Doctor Annie Besant Road',
                                      'short_name': 'Dr Annie Besant Rd',
                                      'types': ['route']},
                                     {'long_name': 'Worli',
                                      'short_name': 'Worli',
                                      'types': ['political',
                                                'sublocality',
                                                'sublocality_level_1']},
                                     {'long_name': 'Mumbai',
                                      'sh

In [8]:
print(r_dict.keys())

dict_keys(['results', 'status'])


In [20]:
print(r_dict['results'][0]['geometry']['location']['lat'])
print(r_dict['results'][0]['geometry']['location']['lng'])

18.994947
72.816374


-  Above procedure
    -  Gettig the address to asuitable format and connect to the google map using the URL and API keys
    -  Get response from the API and convert it into python dict using `json.loads(json_object)`
    -  Get required info for the given address by sorting the data from the dict and list

#### Write function for get the lattitude and longitude for given address

In [22]:
def get_lat_lng(Address):
    api_key = "AIzaSyBXrK8md7uaOcpRpaluEGZAtdXS4pcI5xo"
    add = Address.split(" ")
    final_add = "+".join(add)
    url = url = "https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}".format(final_add, api_key)
    r = requests.get(url)
    r_dict = json.loads(r.text)
    lat = r_dict['results'][0]['geometry']['location']['lat']
    lng = r_dict['results'][0]['geometry']['location']['lng']
    
    return (lat, lng)

In [23]:
# getting some coordinates
print(get_lat_lng("UpGrad, Nishuvi Building, Worli, Mumbai"))
print(get_lat_lng("IIIT Bangalore, Electronic City, Bangalore"))

(18.994947, 72.816374)
(12.8447512, 77.6632317)


Now, what can be a practical use case of using a geocoding API in data analysis? 

Say you are working in an ecommerce retail company, and you have a dataframe containing a list of customer addresses. Your logistics team wants to identify clusters of customers staying close by, so that they can plan the deliveries accordingly.

We have taken some real addresses an examples below. They are stored in a dataframe, and you want to add a column containing the (lat, lng) of each address. 

In [35]:
address_file = r"F:\PGD_UpGrad\Preparatory\Module 5 Python for Data Science\Referance Data\3_Getting_and_Cleaning_Data\3_Getting_and_Cleaning_Data\addresses.txt"
add = pd.read_csv(address_file,sep="\t", header = None)
# renaming the column
add = add.rename(columns={0:'address'})
add.head()

Unnamed: 0,address
0,"777 Brockton Avenue, Abington MA 2351"
1,"30 Memorial Drive, Avon MA 2322"
2,"250 Hartford Avenue, Bellingham MA 2019"
3,"700 Oak Street, Brockton MA 2301"
4,"66-4 Parkhurst Rd, Chelmsford MA 1824"


In [36]:
add.head()['address'].apply(get_lat_lng)

0           (42.0963462, -70.9686115)
1           (42.1210441, -71.0300905)
2    (42.1162105, -71.46537099999999)
3            (42.0981889, -71.056849)
4           (42.6230789, -71.3613232)
Name: address, dtype: object

### Reading the data from the PDF files in python

-  Reading the PDF is not as straightforword as read the text or delimited file, since PDF is contain the image, table
-  We will use the `pyPDF2` to read PDF in python
-  Python is only able to read the text form PDFs, not image and tables

In [4]:
pip install pyPDF2

Note: you may need to restart the kernel to use updated packages.


In [6]:
import PyPDF2

# reading the pdf file
pdf_object = open(r'F:\PGD_UpGrad\Preparatory\Module 5 Python for Data Science\Referance Data\3_Getting_and_Cleaning_Data\3_Getting_and_Cleaning_Data\animal_farm.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_object)

# Number of pages in the PDF file
print(pdf_reader.numPages)

# get a certain page's text
page_object = pdf_reader.getPage(5)

# Extract text from the page_object
print(page_object.extractText())

55
Cowsandhorses,geeseandturkeys,
Allmusttoilforfreedom'ssake.
BeastsofEngland,beastsofIreland,
Beastsofeverylandandclime,
Hearkenwellandspreadmytidings
Ofthegoldenfuturetime.
Thesingingofthissongthrewtheanimalsintothewildestexcitement.
AlmostbeforeMajorhadreachedtheend,theyhadbegunsingingitforthem-
selves.Eventhestupidestofthemhadalreadypickedupthetuneandafewof
thewords,andasforthecleverones,suchasthepigsanddogs,theyhadthe
entiresongbyheartwithinafewminutes.Andthen,afterafewpreliminary
tries,thewholefarmburstoutinto
BeastsofEngland
intremendousunison.
Thecowslowedit,thedogswhinedit,thesheepbleatedit,thehorseswhinnied
it,theducksquackedit.Theyweresodelightedwiththesongthattheysang
itrightthroughetimesinsuccession,andmighthavecontinuedsingingitall
nightiftheyhadnotbeeninterrupted.
Unfortunately,theuproarawokeMr.Jones,whosprangoutofbed,making
surethattherewasafoxintheyard.Heseizedthegunwhichalwaysstoodina
cornerofhisbedroom,andletyachargeofnumber6shotintothedarkness.
Thepelletsburiedthem



In [2]:
import pandas as pd
import numpy as np

GDP = pd.read_csv(r"F:\PGD_UpGrad\Preparatory\Assignment\GDPanalysis\AllStatesGDP.csv")
GDP.head()

Unnamed: 0,Items Description,Duration,Andhra Pradesh,Arunachal Pradesh,Assam,Bihar,Chhattisgarh,Goa,Gujarat,Haryana,...,Telangana,Tripura,Uttar Pradesh,Uttarakhand,West Bengal1,Andaman & Nicobar Islands,Chandigarh,Delhi,Puducherry,All_India GDP
0,GSDP - CURRENT PRICES (` in Crore),2011-12,379402.0,11063.0,143175.0,247144.0,158074.0,42367.0,615606.0,297539.0,...,359433.0,19208.0,724049.0,115523.0,,3979.0,18768.0,343767.0,16818.0,8736039.0
1,GSDP - CURRENT PRICES (` in Crore),2012-13,411404.0,12547.0,156864.0,282368.0,177511.0,38120.0,724495.0,347032.0,...,401493.0,21663.0,822903.0,131835.0,,4421.0,21609.0,391238.0,18875.0,9946636.0
2,GSDP - CURRENT PRICES (` in Crore),2013-14,464272.0,14602.0,177745.0,317101.0,206690.0,35921.0,807623.0,400662.0,...,452186.0,25593.0,944146.0,149817.0,,5159.0,24787.0,443783.0,21870.0,11236635.0
3,GSDP - CURRENT PRICES (` in Crore),2014-15,526468.0,16761.0,198098.0,373920.0,234982.0,40633.0,895027.0,437462.0,...,511178.0,29667.0,1043371.0,161985.0,,5721.0,27844.0,492424.0,24089.0,12433749.0
4,GSDP - CURRENT PRICES (` in Crore),2015-16,609934.0,18784.0,224234.0,413503.0,260776.0,45002.0,994316.0,485184.0,...,575631.0,,1153795.0,184091.0,,,30304.0,551963.0,26533.0,13675331.0
