<a href="https://colab.research.google.com/github/Marcelasoriano/POC-PlacesAPI-pandas-NLPModelling/blob/main/POC_Places_API%2C_pandas_and_NLP_Modelling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proof of Concept: Google Places API, pandas and NLP Modelling

Below is a proof of concept on how to use the Google Places API to extract data for plumbing and/or HVAC organizations/businesses in Chicago and New York City. We will then use then CountVectorizer from sklearn for some basic NLP modeling: Set Up and Extract Data using Google Places API.

*Created by Marcela Soriano


## 1. Google API key to make requests to the Google Places API

The code presented below interacts with the Google Places API to fetch details of certain types of businesses (like plumbing and HVAC) located in two specific cities: Chicago and New York City.

In [None]:
import requests
import pandas as pd

# Use 'AIzaSyBGECmWdlix_3wbJOvTWkDTmQiSN-9EZbQ' as API KEY for testing
API_KEY = 'AIzaSyBGECmWdlix_3wbJOvTWkDTmQiSN-9EZbQ'
BASE_URL = 'https://maps.googleapis.com/maps/api/place/textsearch/json?'
DETAILS_URL = "https://maps.googleapis.com/maps/api/place/details/json"

def get_place_details(place_id):
    params = {
        'place_id': place_id,
        'key': API_KEY
    }
    response = requests.get(DETAILS_URL, params=params)
    details = response.json().get('result', {})
    return {
        'phone_number': details.get('formatted_phone_number'),
        'website': details.get('website')
    }

def get_places(city, keywords):
    query = f'{city} {" ".join(keywords)}'
    params = {
        'query': query,
        'key': API_KEY
    }
    response = requests.get(BASE_URL, params=params)
    results = response.json().get('results', [])

    data = []
    for place in results:
        details = get_place_details(place.get('place_id'))
        data.append({
            'name': place.get('name'),
            'address': place.get('formatted_address'),
            'types': ', '.join(place.get('types', [])),
            'phone_number': details.get('phone_number'),
            'website': details.get('website')
        })

    return data

# Get data for specified keywords in the cities
keywords = ['plumbing', 'HVAC']

chicago_data = get_places('Chicago', keywords)
nyc_data = get_places('New York City', keywords)

chicago_df = pd.DataFrame(chicago_data)
nyc_df = pd.DataFrame(nyc_data)

# Print the first few records for verification
print(chicago_df.head())
print(nyc_df.head())

                                                name  \
0                      Illinois Best Plumbing & HVAC   
1          Burris & Sons Heating, Cooling & Plumbing   
2                      Kavana's Plumbing and Heating   
3          South Chicago Plumbing And Heating Supply   
4  Four Seasons Heating, Air Conditioning, Plumbi...   

                                             address  \
0  2820 W 48th Pl Suite 2126, Chicago, IL 60632, ...   
1  7850 S Colfax Ave, Chicago, IL 60649, United S...   
2  5901 W Corcoran Pl, Chicago, IL 60644, United ...   
3  9275 S South Chicago Ave, Chicago, IL 60617, U...   
4   5701 W 73rd St, Chicago, IL 60638, United States   

                                               types    phone_number  \
0          plumber, point_of_interest, establishment  (224) 212-1317   
1  plumber, general_contractor, point_of_interest...  (773) 375-4123   
2  plumber, general_contractor, point_of_interest...  (773) 908-5454   
3            point_of_interest, store,

### Explanation
* Initialization: The API key and necessary URLs for interaction with the Google Places API are defined.
* Function - get_place_details: This function fetches detailed information for a given place identified by its place_id. Specifically, it retrieves the phone number and website of the place.
* Function - get_places: This function searches for places in a specified city that match given keywords. For each place found, it also calls the get_place_details function to get more detailed information.
* Fetching Data: Keywords (in this case 'plumbing' and 'HVAC') are specified to search for related businesses in Chicago and New York City.
* Data Conversion: The data retrieved for each city is then converted into pandas DataFrames for easier manipulation and analysis.
* Output: The first few records of each DataFrame are printed for verification.
* Note: An API key is given in the code, which is used to authenticate requests to the Google Places API. This key should be kept confidential and, if needed, replaced with a different valid key.

## 2. CountVectorizer for Basic NLP Modeling
The code below analyzes the frequency of different types/categories associated with places in two cities: Chicago and New York City.
This will give us an idea of the most common types/categories associated with these businesses:

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

def analyze_types_frequency(df):
    # Initialize the CountVectorizer
    vectorizer = CountVectorizer()

    # Fit and transform the data
    X = vectorizer.fit_transform(df['types'])

    # Convert to DataFrame for better visualization
    freq_df = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())

    # Sum each column to get frequency of each word/type
    type_frequency = freq_df.sum().sort_values(ascending=False)

    return type_frequency

# Analyze frequency for both cities
chicago_frequency = analyze_types_frequency(chicago_df)
nyc_frequency = analyze_types_frequency(nyc_df)

print("Chicago Types Frequency:\n", chicago_frequency)
print("\nNew York City Types Frequency:\n", nyc_frequency)

Chicago Types Frequency:
 establishment         20
point_of_interest     20
general_contractor    14
plumber               14
store                  2
electrician            1
dtype: int64

New York City Types Frequency:
 establishment         20
point_of_interest     20
plumber               18
general_contractor    15
store                  3
hardware_store         1
home_goods_store       1
dtype: int64


### Explanation
* Function analyze_types_frequency:
Takes a DataFrame (df) as an argument, which is assumed to have a column named 'types'.
It initializes the CountVectorizer from Scikit-Learn, which converts the 'types' into a matrix of token counts.

*  The matrix is then transformed into a DataFrame (freq_df) for visualization purposes.

* Finally, the frequency of each type is computed and sorted in descending order. The function returns these frequencies.

* Analysis for Both Cities:
  
  The function is applied to the data of both Chicago (chicago_df) and New York City (nyc_df) to retrieve the frequency distribution of types in both cities.
The results for both cities are then printed.
* Output:

  The output displays the frequencies of each type/category in both cities, indicating which types/categories are more common.



With the above Proof of Content and script, you'll be extracting data specifically for plumbing and/or HVAC companies/businesses in the specified cities and analyzing the frequency of types/categories associated with them. Again, ensure that you replace 'YOUR_GOOGLE_API_KEY' with your actual API key before running the code.




