# Data Enrichment for Bad Data

# Google Places API Integration and Data Enrichment Process

### 1. Setting Up Google Cloud Console

I started by opening the **Google Cloud Console** and creating a new project to manage and track API usage. This project is necessary for accessing Google's APIs. After the project was created, I navigated to the **API & Services Library** and enabled the **Google Places API**, which provides information such as business addresses and other related details.

### 2. Creating the API Key

Next, I created an **API key** to authenticate requests to the Google Places API. The API key is used by the code to send requests to the API and retrieve business information like addresses and phone numbers. This key must be kept private to avoid unauthorized use.


### 3. Writing the Code

I developed two key functions to interact with the Google Places API:

- **`get_place_id()`**: This function sends a query to the Google Places API using the company name, street address, and country to form a search query. The function returns the **Place ID** and **formatted address** of the business, which is essential for fetching more details.
  
- **`get_place_details()`**: After retrieving the **Place ID** from the previous function, this function queries the **Place Details API** to retrieve additional information about the business, such as the **phone number**. The function returns the phone number if it’s available.

These functions help retrieve both the address and phone number of the business, which can then be used to enrich the dataset.

### 4. Enriching the Dataset

The main function, **`enrich_bad_data()`**, iterates over a DataFrame (`bad_data`) that contains company information. For each company, if certain fields (such as the **cleaned phone number** or **street address**) are missing, the function first uses `get_place_id()` to query the **Google Places API** for the **Place ID** and address.

If a **Place ID** is found, the function then calls `get_place_details()` to retrieve the business's phone number. If either the address or phone number is found, it is added to new columns (`new_address` and `new_phone_number`) in the DataFrame.

An additional column, **`enrichment`**, is used to track whether any new data was added to the row. If the business's information is successfully enriched (i.e., a new address or phone number is added), the row is marked as **"enriched"**.

### Code Explanation

- **`get_place_id()`**:
  - This function sends a request to the Google Places API to retrieve the **Place ID** and **formatted address** of a business based on the company name, street, and country.
  - It returns the **Place ID** and address if found.

- **`get_place_details()`**:
  - This function uses the **Place ID** obtained from `get_place_id()` to query the **Place Details API** and retrieve the business's **phone number**.
  - It returns the phone number if available.

- **`enrich_bad_data()`**:
  - This function loops through each row of the DataFrame and checks if essential fields (like phone numbers or addresses) are missing.
  - If data is missing, it calls `get_place_id()` and `get_place_details()` to fetch the necessary information.
  - The retrieved data is stored in two new columns: `new_address` and `new_phone_number`. The **enrichment status** is tracked with an additional column called `enrichment`, indicating whether the row has been updated.

### Summary

This process allows for filling in missing business data (like **addresses** and **phone numbers**) using the **Google Places API**, enriching the dataset with up-to-date information. The enriched data is saved in new columns (`new_address`, `new_phone_number`), and the process is tracked using an **enrichment status column** to indicate which rows have been successfully updated.

<span style = "color:red" > [Note] During the enrichment process for about 80% of the leads - new addresses and phone numbers are generated </span>


### Importing statements

In [1]:
import pandas as pd
import requests # allows you to send HTTP requests (such as GET or POST) to web services or APIs and retrieve responses
# pip install requests - if needed

In [2]:
# file location
file_path = '../Inputs/leads_in_review_data.xlsx' # relative path
# read data
bad_data = pd.read_excel(file_path)

In [3]:
import pandas as pd
import requests

# I used here the API KEY of mine
API_KEY = 'insert_your_API_Key_here_for_the_code_to_run' #if necessary please ask for the key


# Function to fetch business details from Google Places API
def get_place_id(company_name, street=None, country=None):
    try:
        search_url = "https://maps.googleapis.com/maps/api/place/findplacefromtext/json"
        query = f"{company_name}"
        if street:
            query += f", {street}"
        if country:
            query += f", {country}"
        
        params = {
            'input': query,
            'inputtype': 'textquery',
            'fields': 'place_id,formatted_address',
            'key': API_KEY
        }
        response = requests.get(search_url, params=params)
        print(f"API Response for {company_name}: {response.json()}")  # Debugging: print API response
        results = response.json().get('candidates', [])
        if results:
            business_info = results[0]
            return {
                'place_id': business_info.get('place_id', None),
                'address': business_info.get('formatted_address', None)
            }
        return None
    except Exception as e:
        print(f"Error fetching place ID for {company_name}: {e}")
        return None

# Function to fetch additional details (like phone number) using Place Details API
def get_place_details(place_id):
    try:
        details_url = "https://maps.googleapis.com/maps/api/place/details/json"
        params = {
            'place_id': place_id,
            'fields': 'formatted_phone_number',
            'key': API_KEY
        }
        response = requests.get(details_url, params=params)
        print(f"Place Details Response: {response.json()}")  # Debugging: print API response
        result = response.json().get('result', {})
        return {
            'phone': result.get('formatted_phone_number', None)
        }
    except Exception as e:
        print(f"Error fetching place details: {e}")
        return None

# Function to enrich the bad_data DataFrame and add new columns
def enrich_bad_data(df):
    df['new_address'] = None  # Initialize new address column
    df['new_phone_number'] = None  # Initialize new phone number column
    df['enrichment'] = 'not enriched'  # Initialize the enrichment column
    
    for index, row in df.iterrows():
        enrichment_made = False  # Track whether any enrichment was made
        
        company_name = row['firma']
        street = row.get('street')
        country = row.get('country')

        # Fetch business info using the Google Places API
        business_info = get_place_id(company_name, street, country)

        if business_info:
            place_id = business_info.get('place_id')
            if place_id:
                # Fetch phone number and other details using the Place Details API
                place_details = get_place_details(place_id)

                # Add the new phone number if available
                if place_details.get('phone'):
                    df.at[index, 'new_phone_number'] = place_details.get('phone')
                    enrichment_made = True

            # Add the new address if available
            if business_info.get('address'):
                df.at[index, 'new_address'] = business_info.get('address')
                enrichment_made = True

        # Mark as "enriched" if any new data was added
        if enrichment_made:
            df.at[index, 'enrichment'] = 'enriched'

    return df

# Enrich the dataset by adding new information as columns (e.g., address, phone number)
enriched_bad_data = enrich_bad_data(bad_data)

# Display the enriched data with new columns
enriched_bad_data

API Response for Autohaus Hentschel GmbH: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for Autohaus Siegmar GmbH: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for Autohaus Zückner GmbH & Co. KG: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for Auto Böhler: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for EJP Frank Bach & Katarzyna Bach GbR: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for BREMSKERL-REIBBELAGWERKE Emmerling GMBH & CO. KG: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status': 'REQUEST_DENIED'}
API Response for Werner Bölling GmbH: {'candidates': [], 'error_message': 'The provided API key is invalid.', 'status'

Unnamed: 0,firma,street,plz,city,telefon,country,country code,cleaned phone number,flag,salutation,first name,surname,digit length,firma length,unique_id,new_address,new_phone_number,enrichment
0,Autohaus Hentschel GmbH,Vahrenwalder Str. 141,30165,Hannover,+49_x001D_17_x0011_86221169,DE,49,117001186221169,bad data,No data,No data,No data,15.0,23.0,bad_1,,,not enriched
1,Autohaus Siegmar GmbH,Anton-Erhardt-Straße 5,9117,Chemnitz,+_x001D__x0004_49179_x0008_9703167,DE,49,100044917900089703167,bad data,No data,No data,No data,21.0,21.0,bad_2,,,not enriched
2,Autohaus Zückner GmbH & Co. KG,Gildestraße 5,91154,Roth,0049176-9142078,DE,49,1769142078,bad data,No data,No data,No data,10.0,30.0,bad_3,,,not enriched
3,Auto Böhler,Ottostraße 6,76227,Karlsruhe,/179/00182_x000C__x0007_38,DE,49,17900182000000738,bad data,No data,No data,No data,17.0,11.0,bad_4,,,not enriched
4,EJP Frank Bach & Katarzyna Bach GbR,Elisabethstrasse 24,2826,Görlitz,Hotline: 176-0699874 (+49),DE,49,1760699874,bad data,No data,No data,No data,10.0,35.0,bad_5,,,not enriched
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
379,Kunz Metallbau GmbH,Adolf-Todt-Str. 28,65203,Wiesbaden,,DE,49,,bad data,Herr,Richard,Kunz,,19.0,bad_380,,,not enriched
380,Gebrüder Grüske GmbH,Meisenweg 17,82110,Germering,,DE,49,,bad data,Herr,Werner,Grüske,,20.0,bad_381,,,not enriched
381,Ofenbau Unterseher GmbH,Kufsteiner Str. 49,83126,Flintsbach am Inn,,DE,49,,bad data,Herr,Georg,Unterseher,,23.0,bad_382,,,not enriched
382,Fastr GmbH,Kurfürstendamm 217,10719,Berlin,,DE,49,,bad data,Herr,Achim,Gasper,,10.0,bad_383,,,not enriched


In [4]:
enriched_bad_data['enrichment'].value_counts()

enrichment
not enriched    384
Name: count, dtype: int64

In [5]:
enriched_bad_data.head()

Unnamed: 0,firma,street,plz,city,telefon,country,country code,cleaned phone number,flag,salutation,first name,surname,digit length,firma length,unique_id,new_address,new_phone_number,enrichment
0,Autohaus Hentschel GmbH,Vahrenwalder Str. 141,30165,Hannover,+49_x001D_17_x0011_86221169,DE,49,117001186221169,bad data,No data,No data,No data,15.0,23.0,bad_1,,,not enriched
1,Autohaus Siegmar GmbH,Anton-Erhardt-Straße 5,9117,Chemnitz,+_x001D__x0004_49179_x0008_9703167,DE,49,100044917900089703167,bad data,No data,No data,No data,21.0,21.0,bad_2,,,not enriched
2,Autohaus Zückner GmbH & Co. KG,Gildestraße 5,91154,Roth,0049176-9142078,DE,49,1769142078,bad data,No data,No data,No data,10.0,30.0,bad_3,,,not enriched
3,Auto Böhler,Ottostraße 6,76227,Karlsruhe,/179/00182_x000C__x0007_38,DE,49,17900182000000738,bad data,No data,No data,No data,17.0,11.0,bad_4,,,not enriched
4,EJP Frank Bach & Katarzyna Bach GbR,Elisabethstrasse 24,2826,Görlitz,Hotline: 176-0699874 (+49),DE,49,1760699874,bad data,No data,No data,No data,10.0,35.0,bad_5,,,not enriched


## Writing the data to excel file

In [6]:
# uncomment the codes if you need to write into excel file again
# enriched_bad_data.to_excel('../Inputs/leads_enriched_data.xlsx', index=False)