# Creating a Master Dataset

To streamline the analysis process, I consolidated multiple data sources into a single, unified master dataset. This ensures all relevant information is available in one place for efficient analysis.

# Importing the Pandas Library

I imported the **Pandas** library, which is essential for data manipulation and analysis. Pandas provides powerful tools for handling datasets, including reading, merging, and exporting data, making it a crucial part of this project.


In [None]:
import pandas as pd  # Import the pandas library for data manipulation and analysis

# Loading Individual Datasets

To begin the data consolidation process, I loaded three separate datasets into individual DataFrames:

1. **Google Clicks**:
   - File Path: `C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA_google_clicks.xlsx`
   - Contains data on clicks generated through Google.

2. **Facebook Impressions**:
   - File Path: `C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA_fb_impressions.xlsx`
   - Includes impression data from Facebook.

3. **Quantity Data**:
   - File Path: `C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA.xlsx`
   - Provides sales-related data for Product A.

Each dataset was read into a Pandas DataFrame using the `read_excel` function, ensuring the data is ready for merging and further processing.


In [None]:
# Load each Excel file into a DataFrame
google_clicks = pd.read_excel(r'C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA_google_clicks.xlsx')
fb_impressions = pd.read_excel(r'C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA_fb_impressions.xlsx')
quantity = pd.read_excel(r'C:\Users\nitin\OneDrive\Documents\Infosys Springboard Internship Files\NitinMishra-Infosys-Nov24\datasets\ProductA.xlsx')


# Merging Datasets

To create a unified master dataset, I merged the three individual DataFrames on their common column, **`Day Index`**. Here's what this step accomplished:

- Combined data from:
  - **Google Clicks**
  - **Facebook Impressions**
  - **Quantity Data**
- Used Pandas' `merge` method to join the datasets, ensuring alignment based on the **`Day Index`** column.

This operation resulted in a comprehensive dataset that consolidates all relevant metrics, making it ready for analysis and visualization.


In [None]:
# Merge the three DataFrames on the common column "Day Index"
# This combines all data into a single master dataset
master_dataset = google_clicks.merge(fb_impressions, on="Day Index").merge(quantity, on="Day Index")

# Saving the Master Dataset

After merging the datasets, I saved the resulting master dataset to an Excel file for future use. Here's what I did:

- Used the `to_excel` method to export the merged DataFrame to an Excel file.
- Saved the file at the following location:
  `C:/Users/nitin/OneDrive/Documents/Infosys Springboard Internship Files/NitinMishra-Infosys-Nov24/datasets/master_dataset/master_dataset.xlsx`
- Ensured that the **index was excluded** (`index=False`) for a cleaner output.

This step makes the unified dataset accessible for subsequent analyses and visualizations.


In [None]:
# Save the merged dataset to an Excel file in the specified directory
master_dataset.to_excel("C:/Users/nitin/OneDrive/Documents/Infosys Springboard Internship Files/NitinMishra-Infosys-Nov24/datasets/master_dataset/master_dataset.xlsx", index=False)
# Print a confirmation message when the process is complete
print("Done")