<a href="https://colab.research.google.com/github/animesh-11/AI_ML/blob/main/File_Handling_Starter_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Objective
The objective of this assignment is to read data from popular file formats (.csv, .txt, .json), to perform operations on this data, and to finally save this modified data into the original file formats.

# Pipeline that needs to be followed

The overall objective of this project is to create a system for managing product information in an e-commerce platform. The different stages involved in the process are outlined below:

### 1. Set up project and load data

&emsp;**1.1** Import required libraries  
&emsp;**1.2** Load the data

### 2. Create or update data

&emsp;**2.1** Add or update sales data  
&emsp;**2.2** Add or update product details  
&emsp;**2.3** Add or update product description  
&emsp;**2.4** Update function

### 3. Save data to disk

&emsp;**3.1** Save data to disk


## **1.**  Set up project and load data  <font color = red>[15 marks]</font>

In this stage, you will set up the environment for this assignment by loading the required modules and files. You will explore the files by displaying their content.

### **1.1** - Import required modules  <font color = red>[5 marks]</font>

### Description
In this task, you will import all the necessary modules and packages required for performing various operations in the project.

In [1]:
# Use this cell to import all the required packages and methods

# Import package for navigating through files stored on your device/on Google Colaboratory
import os

# Import package for working with JSON files
import json

# Import package for working with CSV files
import csv
import pandas as pd

### **1.2** Load the data  <font color = red>[10 marks]</font>

### Description
In this task, you will write a function that ensures that the necessary files are loaded into the environment. To index the data, you will use a unique identifier called SKU.

This includes loading sales data from a CSV file, product details from JSON files, and product descriptions from text files. We recommend that you either use Jupyter Notebook or Google Colab to build and execute your code.

First, if you are using Google Colab, mount Google Drive to your VM. If not, skip and comment out this cell.

In [2]:
# Use this cell to write your code for mounting your Google Drive
# Note: If you are not using Google Colab, please skip this cell

# In case you are using Google Colab, mount your Google Drive before moving on
from google.colab import drive
drive.mount('/content/drive', force_remount = True)

Mounted at /content/drive


If you are using Colab, after mounting the drive you need to unzip the files to extract all the images inside it. Note that you don't need to perform this step more than once, so we recommend that you comment out your code for this step once it has executed.

In [None]:
# Use this cell to write your code for unzipping the data and storing it in Google Drive
# Note: If you are not using Google Colab, please skip this cell
# Note: You can comment out this cell after running it once

# Unzip your files and store them in your drive
# !unzip /content/drive/MyDrive/File_Handling_Project/mainfolder.zip

**Alternatively,** you can also upload files to the Google Colab runtime environment without mounting Google Drive. In this case so you will always be in the same path/directory inside your Google Colab runtime. Files will be saved into your runtime and not into your Google Drive.
The files you uploaded will be available until you delete the runtime.

In [2]:
# Use this cell to write your code for uploading the zip file
# Note: If you are not using Google Colab, please skip this cell

# Upload the zip file to Google Colab runtime
from google.colab import files
uploaded = files.upload()

Saving mainfolder.zip to mainfolder.zip


After uploading your zip file to Google Colab runtime you need to unzip the files to extract all the files inside it.

In [3]:
# Use this cell to write your code for unzipping the data and storing it in Google Colab runtime
# Note: If you are not using Google Colab, please skip this cell
# Note: You can comment out this cell after running it once

# Unzip your files and store them in Google Colab runtime
!unzip /content/mainfolder.zip

Archive:  /content/mainfolder.zip
   creating: mainfolder/
   creating: mainfolder/product_descriptions/
  inflating: mainfolder/product_descriptions/description_AISJDKFJW93NJ.txt  
  inflating: mainfolder/product_descriptions/description_DJKFIEI432FIE.txt  
  inflating: mainfolder/product_descriptions/description_GGOENEBJ079499.txt  
  inflating: mainfolder/product_descriptions/description_HJSKNWK429DJE.txt  
  inflating: mainfolder/product_descriptions/description_JFKL3940NFKLJ.txt  
  inflating: mainfolder/product_descriptions/description_LKDFJ49LSDJKL.txt  
  inflating: mainfolder/product_descriptions/description_MWKDI3JFK39SL.txt  
  inflating: mainfolder/product_descriptions/description_NEKFJOWE9FDIW.txt  
  inflating: mainfolder/product_descriptions/description_OWEJL398FWJLK.txt  
  inflating: mainfolder/product_descriptions/description_XPLFJW2490XJN.txt  
   creating: mainfolder/product_details/
  inflating: mainfolder/product_details/details_AISJDKFJW93NJ.json  
  inflating: m

Now define the *load_data()* function.

In [4]:
def load_data(main_folder):
    """
    Loads sales data from a CSV file, product details from JSON files,
    and product descriptions from text files within a specified main folder.

    Args:
        main_folder (str): The path to the main folder containing the data files.

    Returns:
        tuple: A tuple containing three elements:
            - product_details (dict): A dictionary where keys are SKUs and
                                       values are dictionaries of product details.
            - sales_data (pd.DataFrame): A pandas DataFrame containing sales data.
            - product_descriptions (dict): A dictionary where keys are SKUs and
                                          values are strings of product descriptions.
    """
    sales_data_path = os.path.join(main_folder, 'sales_data.csv')
    product_details_folder = os.path.join(main_folder, 'product_details')
    product_descriptions_folder = os.path.join(main_folder, 'product_descriptions')

    # Load sales data from CSV
    sales_data = pd.read_csv(sales_data_path)

    # Load product details from JSON files
    product_details = {}
    for filename in os.listdir(product_details_folder):
        if filename.endswith('.json'):
            sku = filename.replace('details_', '').replace('.json', '')
            filepath = os.path.join(product_details_folder, filename)
            with open(filepath, 'r') as f:
                product_details[sku] = json.load(f)

    # Load product descriptions from text files
    product_descriptions = {}
    for filename in os.listdir(product_descriptions_folder):
        if filename.endswith('.txt'):
            sku = filename.replace('description_', '').replace('.txt', '')
            filepath = os.path.join(product_descriptions_folder, filename)
            with open(filepath, 'r') as f:
                product_descriptions[sku] = f.read()

    return product_details, sales_data, product_descriptions

Load your data here

In [5]:
# Use this cell to load the files
main_folder_address = '/content/mainfolder'
product_details, sales_data, product_descriptions = load_data(main_folder_address)

## **2.** Update data  <font color = red>[25 marks]</font>
In this stage, you will define a function `update()` to add sales data, product details, and product descriptions for a new product or update an existing product. If the product does not exist, the function will default to creating a new product. If the product exists, the function will instead update that product. You will also define some sub-functions to complete smaller tasks.

### **2.1** Update sales data  <font color = red>[5 marks]</font>

### Description
In this task, you will write a function to add sales data for a new product or update sales data for an existing product given the SKU and the quantities that need to be added or updated.

In [6]:
def update_sales_data(sales_data, sku, quantities):
    """
    Adds or updates sales data for a product in the sales_data DataFrame.

    Args:
        sales_data (pd.DataFrame): The DataFrame containing sales data.
        sku (str): The SKU of the product to add or update.
        quantities (list or pd.Series): A list or Series of quantities for each day.

    Returns:
        pd.DataFrame: The updated sales_data DataFrame.
    """
    if sku in sales_data['Product_SKU'].values:
        # Update existing product
        sales_data.loc[sales_data['Product_SKU'] == sku, sales_data.columns[1:]] = quantities
    else:
        # Add new product
        new_row = {'Product_SKU': sku}
        for i, qty in enumerate(quantities):
            new_row[f'Day{i+1}'] = qty
        sales_data = pd.concat([sales_data, pd.DataFrame([new_row])], ignore_index=True)
    return sales_data

Check your code here.

In [8]:
sales_data = update_sales_data(sales_data,
                                'DJKFIEI432FIE',
                                22)
sales_data

Unnamed: 0,Product_SKU,Day1,Day2,Day3,Day4,Day5,Day6,Day7,Day8,Day9,Day10,Day11,Day12,Day13,Day14
0,AISJDKFJW93NJ,10,12,15,18,20,22,25,28,26,30,32,29,27,24
1,DJKFIEI432FIE,22,22,22,22,22,22,22,22,22,22,22,22,22,22
2,GGOENEBJ079499,15,18,22,25,28,20,17,23,19,21,24,27,18,20
3,HJSKNWK429DJE,30,32,35,38,40,42,45,48,50,52,55,53,49,47
4,JFKL3940NFKLJ,18,20,22,25,28,30,32,35,38,36,33,29,26,24
5,LKDFJ49LSDJKL,25,28,30,32,35,38,42,40,37,34,36,31,29,27
6,MWKDI3JFK39SL,30,35,40,45,50,42,37,38,41,36,33,39,40,44
7,NEKFJOWE9FDIW,12,15,18,20,22,24,21,23,25,28,30,27,26,29
8,OWEJL398FWJLK,20,22,25,28,30,32,35,38,36,33,29,26,24,27
9,XPLFJW2490XJN,5,8,9,12,15,10,14,16,20,18,22,25,19,21


### **2.2** Update product details  <font color = red>[5 marks]</font>

### Description
In this task, you will write a function to add product details for a new product or update product details for an existing product using the product SKU.

In [9]:
def update_product_details(product_details, sku, product_info):
    """
    Adds or updates product details for a product in the product_details dictionary.

    Args:
        product_details (dict): The dictionary containing product details.
        sku (str): The SKU of the product to add or update.
        product_info (dict): A dictionary containing the product details.

    Returns:
        dict: The updated product_details dictionary.
    """
    product_details[sku] = product_info
    return product_details

Check your code here.

In [10]:
product_details = update_product_details(product_details,
                                          'DJKFIEI432FIE',
                                          	{
    "product_name": "Nike Running Shoes",
    "brand": "Nike",
    "model": "Speed Ultra",
    "specifications": "Size 8, Lightweight design, Breathable material",
    "price": "$99.99",
    "availability": "In stock"
})
product_details

{'MWKDI3JFK39SL': {'product_name': 'Fictional Novel',
  'brand': 'BestBooks',
  'model': None,
  'specifications': 'Paperback, 300 pages',
  'price': '$14.99',
  'availability': 'In stock'},
 'AISJDKFJW93NJ': {'product_name': 'Wall Art Print',
  'brand': 'ArtCraft',
  'model': 'NatureCanvas-1001',
  'specifications': 'Canvas print, Ready to hang',
  'price': '$49.99',
  'availability': 'In stock'},
 'GGOENEBJ079499': {'product_name': 'Smartphone',
  'brand': 'XYZ Electronics',
  'model': 'ABC-2000',
  'specifications': '6.5-inch display, 128GB storage, 16MP camera',
  'price': '$499.99',
  'availability': 'In stock'},
 'XPLFJW2490XJN': {'product_name': 'Robot Vacuum Cleaner',
  'brand': 'CleanTech',
  'model': 'AutoSweep-9000',
  'specifications': 'Smart navigation, HEPA filter, 90 minutes runtime',
  'price': '$249.99',
  'availability': 'In stock'},
 'HJSKNWK429DJE': {'product_name': 'Wireless Earbuds',
  'brand': 'SoundSync',
  'model': 'TunePro-2022',
  'specifications': 'Bluetooth

### **2.3** Update product description  <font color = red>[5 marks]</font>

### Description
In this task, you will write a function to add a product description for the new product using its product SKU.

In [12]:
def update_product_description(product_descriptions, sku, description):
    product_descriptions[sku] = description
    return product_descriptions

Check your code here.

In [13]:
product_descriptions = update_product_description(product_descriptions,
                                                  'DJKFIEI432FIE',
                                                  "Just Do It ..NIKE")
product_descriptions

{'AISJDKFJW93NJ': "Transform your living space with ArtCraft's NatureCanvas-1001 Wall Art Print.\nThis canvas print, ready to hang, brings the beauty of nature into your home.\nWith dimensions of 16 x 20 inches and a 4.6/5 stars rating, it's a stunning addition to your decor, creating a focal point that captures attention and sparks conversation.",
 'DJKFIEI432FIE': 'Just Do It ..NIKE',
 'OWEJL398FWJLK': "Elevate your yoga practice with ZenFitness' EcoMat-500 Yoga Mat.\nFeaturing a non-slip surface, 6mm thickness, and eco-friendly materials, this high-quality mat provides the perfect foundation for your workouts.\nAvailable in Purple, Green, and Blue, it not only enhances your comfort but also adds a touch of serenity to your exercise routine.",
 'GGOENEBJ079499': 'Dive into the future with the XYZ Electronics Smartphone, model ABC-2000.\nBoasting a 6.5-inch display, 128GB storage, and a 16MP camera, this powerful device redefines the smartphone experience.\nWith a sleek design and ava

### **2.4** Update function  <font color = red>[10 marks]</font>

### Description
In this task, you will write a function that combines the functionalities of adding sales data, product details, and product description for a new product SKU, or updating these for an existing product SKU.

In [14]:
def update(product_details, sales_data, product_descriptions, sku, quantities=None, product_info=None, description=None):
    """
    Adds or updates sales data, product details, and product descriptions for a product.

    Args:
        product_details (dict): The dictionary containing product details.
        sales_data (pd.DataFrame): The DataFrame containing sales data.
        product_descriptions (dict): The dictionary containing product descriptions.
        sku (str): The SKU of the product to add or update.
        quantities (list or pd.Series, optional): Quantities for sales data. Defaults to None.
        product_info (dict, optional): Dictionary of product details. Defaults to None.
        description (str, optional): Product description. Defaults to None.

    Returns:
        tuple: The updated product_details, sales_data, and product_descriptions.
    """
    if quantities is not None:
        sales_data = update_sales_data(sales_data, sku, quantities)
    if product_info is not None:
        product_details = update_product_details(product_details, sku, product_info)
    if description is not None:
        product_descriptions = update_product_description(product_descriptions, sku, description)

    return product_details, sales_data, product_descriptions

Check your code here.

In [16]:
product_details, sales_data, product_descriptions = update(product_details, sales_data, product_descriptions,"DJKFIEI432FIE")

## **3.** Save data to disk  <font color = red>[10 marks]</font>

In the this stage, learners are tasked with creating a `dump_data()` function which will allow the newly modified files to be saved in their corresponding file formats: CSV for sales data, JSON for product details, and plain text (.txt) for product descriptions.



### **3.1** Save data to disk  <font color = red>[10 marks]</font>

### Description
In this task, learners are tasked with implementing a Python function named `dump_data()` that automates the process of persisting sales data, product details, and product descriptions into structured files within a specified directory. The function should efficiently organize and dump each type of data into its corresponding file format: CSV for sales data, JSON for product details, and plain text for product descriptions. This exercise challenges learners to apply file I/O operations, directory management, and data serialization techniques in Python, ensuring they gain practical experience with data persistence, manipulation, and organization on the filesystem.

In [17]:
def dump_data(sales_data, product_details, product_descriptions, main_folder):
    """
    Saves sales data, product details, and product descriptions to disk
    in their respective file formats.

    Args:
        sales_data (pd.DataFrame): The DataFrame containing sales data.
        product_details (dict): The dictionary containing product details.
        product_descriptions (dict): The dictionary containing product descriptions.
        main_folder (str): The path to the main folder to save the data.
    """
    sales_data_path = os.path.join(main_folder, 'sales_data.csv')
    product_details_folder = os.path.join(main_folder, 'product_details')
    product_descriptions_folder = os.path.join(main_folder, 'product_descriptions')

    # Save sales data to CSV
    sales_data.to_csv(sales_data_path, index=False)

    # Save product details to JSON files
    os.makedirs(product_details_folder, exist_ok=True)
    for sku, details in product_details.items():
        filepath = os.path.join(product_details_folder, f'details_{sku}.json')
        with open(filepath, 'w') as f:
            json.dump(details, f, indent=4)

    # Save product descriptions to text files
    os.makedirs(product_descriptions_folder, exist_ok=True)
    for sku, description in product_descriptions.items():
        filepath = os.path.join(product_descriptions_folder, f'description_{sku}.txt')
        with open(filepath, 'w') as f:
            f.write(description)

Check your function here.

In [18]:
dump_data(sales_data, product_details, product_descriptions, main_folder_address)

You will notice that *mainfolder* now has new files in the product descriptions/details subfolders, as well as new rows in *sales_data.csv* corresponding to the products that you created in stage 2, and while checking your code.