**Assignment Submission Guidelines**

**1. Submission Platform:**

- Submit your completed assignment through Google Classroom.

**2. Submission Format:**

- Submit the Google Colab Notebook (.ipynb file) provided as the assignment template.
- Do not create a new notebook. Fill in the provided template.

**3. Template Completion:**

The template notebook contains:
- The code to generate the student_performance_detailed_nan.csv dataset.
- Placeholders for your code and explanations for each question.

Follow the instructions within the template.
- Code Cells:
  - Place your code solutions directly in the designated code cells below each question.
- Markdown Cells:
  - Provide your explanations and justifications in the designated Markdown cells.
- Report section:
  - Complete the markdown section at the bottom of the notebook titled "Report".
  - In this section, compile the explanation of each of the questions.
  - Answer the following data analysis questions:
    1.  What are the key characteristics of the student population in this dataset?
    2. Which factors appear to have the strongest influence on student grades?
    3. What are the most common missing data patterns, and what implications might they have?
    4. Based on your analysis, what are 2-3 recommendations you would make to improve student performance?

- Do not modify the structure of the template notebook.

**4. File Naming:**

Ensure the file name remains as provided in the template. Do not rename the file.

**5. Timely Submission:**

- Submit your completed template notebook by the deadline: **24th of March, 2025**.
- Late submissions will be penalized as follows:
- Submissions within **5:00pm 26th of March, 2025**  will receive a maximum of 5 marks for timely submission.
Submissions after  will receive 0 marks for timely submission.

**6. Report:**

- Complete the "Report" section at the end of your notebook.
- Ensure your report is:
  - Well-organized and easy to read.
  - Clear and concise.
  - Free of grammatical errors.

**7. Code Execution:**

Ensure your completed notebook runs without errors from top to bottom.
Before submitting, restart the kernel and run all cells to confirm reproducibility.



**8. Academic Integrity:**

All work must be your own.
Plagiarism will result in a failing grade.
Cite any external resources you use.



**Tips for Success:**

- Start the assignment early.
- Read the instructions within the template carefully.
- Plan your approach before coding.
- Test your code thoroughly.
- Document your work clearly.
- Review the rubrics to understand the grading criteria.


**Grading Rubrics:**

Total 50 Marks

- Timely Submission: 10 Marks
- Report : 10 Marks
- Level 1 (Basic Questions): 5 Marks (1 x 5 = 5)
- Level 2 (Intermediate Questions): 10 Marks (2 x 5 = 10)
- Level 3 (Advanced Questions): 15 Marks (3 x 5 = 15)

**Assignment: International Trade Analysis**

**Background:**

You are a data analyst working for "Global Trade Insights," a firm specializing in international trade analytics. Global Trade Insights partners with governments, businesses, and international organizations to provide data-driven insights into trade patterns and trends. Your team has been tasked with analyzing datasets containing details of international exports and imports. These datasets, compiled from raw sources, contain information on global trade transactions, including trade values, product categories, and country details.

Your goal is to leverage this data to uncover key patterns in international trade and identify opportunities for improved trade policies and business strategies. By identifying these trends, you can provide actionable recommendations to stakeholders for better trade facilitation, market access, and economic growth.

In [None]:
pip install Faker

In [None]:
import pandas as pd
import numpy as np
from faker import Faker
import random

# Initialize Faker
fake = Faker()

# Generate Export Data
df_export = pd.DataFrame({
    'export_id': range(1, 501),
    'exporter_country': [fake.country() for _ in range(500)],
    'importer_country': [fake.country() for _ in range(500)],
    'product_category': np.random.choice(['Electronics', 'Agriculture', 'Textiles', 'Machinery', 'Pharmaceuticals'], 500),
    'trade_value': np.round(np.random.uniform(1000, 500000, 500), 2),
    'currency': np.random.choice(['USD', 'EUR', 'GBP', 'JPY', 'INR'], 500),
    'export_date': [fake.date_this_decade() for _ in range(500)]
})

df_export.to_csv('export_data.csv', index=False)

# Generate Import Data
df_import = pd.DataFrame({
    'import_id': range(1, 501),
    'importer_country': [fake.country() for _ in range(500)],
    'exporter_country': [fake.country() for _ in range(500)],
    'product_category': np.random.choice(['Electronics', 'Agriculture', 'Textiles', 'Machinery', 'Pharmaceuticals'], 500),
    'trade_value': np.round(np.random.uniform(1000, 500000, 500), 2),
    'currency': np.random.choice(['USD', 'EUR', 'GBP', 'JPY', 'INR'], 500),
    'import_date': [fake.date_this_decade() for _ in range(500)]
})

df_import.to_csv('import_data.csv', index=False)

print("Synthetic datasets generated: 'export_data.csv' and 'import_data.csv'")


**Data**

export_data.csv:
- export_id: Unique identifier for each export transaction (integer).
- exporter_country: Country exporting the goods (string).
- importer_country: Country importing the goods (string).
- product_category: Category of goods exported (string: Electronics, Agriculture, Textiles, Machinery, Pharmaceuticals).
- trade_value: Monetary value of the exported goods (float).
- currency: Currency in which trade value is recorded (string: USD, EUR, GBP, JPY, INR).
- export_date: Date of the export transaction (date/string).

import_data.csv:
- import_id: Unique identifier for each import transaction (integer).
- importer_country: Country receiving the goods (string).
- exporter_country: Country sending the goods (string).
- product_category: Category of goods imported (string: Electronics, Agriculture, Textiles, Machinery, Pharmaceuticals).
- trade_value: Monetary value of the imported goods (float).
currency: Currency in which trade value is recorded (string: USD, EUR, GBP, JPY, INR).
- import_date: Date of the import transaction (date/string).

## **Beginner (RBT Levels: 2, 3):**

Total: 5 Marks

Each Question Carry 1 Mark

**1. Data Loading and Exploration:**

- Load export_data.csv and import_data.csv into Pandas DataFrames.
- Display the first 5 rows and use .info() to display data types.


In [None]:
# Data Loading and Exploration:
# Load export_data.csv and import_data.csv into Pandas DataFrames.
# Display the first 5 rows and use .info() to display data types.
# Your Code Here

**Explanation**

[Your explanation here]

**2. Data Merging:**
- Merge the two datasets using an inner join on exporter_country and importer_country.
- Display the first 5 rows of the merged DataFrame.

In [None]:
# Data Merging:
# Merge the two datasets using an inner join on exporter_country and importer_country.
# Display the first 5 rows of the merged DataFrame.
# Your Code Here

**Explanation**

[Your explanation here]

**3. Missing Value Identification:**

Identify columns with missing values and report the count of missing values in each.

In [None]:
# Missing Value Identification:
# Identify columns with missing values and report the count of missing values in each.
# Your Code Here

**Explanation**

[Your explanation here]

**4. Duplicate Row Removal:**

Check for and remove any duplicate rows in the export_data.csv DataFrame.

In [None]:
# Duplicate Row Removal:
# Check for and remove any duplicate rows in the export_data.csv DataFrame.
# Your Code Here

**Explanation**

[Your explanation here]

**5. Basic Column Renaming:**

Rename the trade_value column in both datasets to trade_amount.

In [None]:
# Basic Column Renaming:
# Rename the trade_value column in both datasets to trade_amount.
# Your Code Here

**Explanation**

[Your explanation here]

## **Intermediate (RBT Levels: 3, 4):**

Total: 10 Marks

Each Question Carry 2 Marks



**6. Missing Value Imputation:**

- Impute missing values in the currency column with the mode (most frequent currency).
- Impute missing values in export_date and import_date with the most recent date.


In [None]:
# Missing Value Imputation
# Impute missing values in the currency column with the mode (most frequent currency).
# Impute missing values in export_date and import_date with the most recent date.
# Your Code Here

**Explanation**

[Your explanation here]

**7. Categorical Data Conversion:**

Apply one-hot encoding to the product_category column in both datasets.


In [None]:
# Categorical Data Conversion:
# Apply one-hot encoding to the product_category column in both datasets.
# Your Code Here


**Explanation**

[Your explanation here]

**8. String Manipulation:**

Convert the exporter_country and importer_country columns to uppercase.

In [None]:
# String Manipulation:
# Convert the exporter_country and importer_country columns to uppercase.
# Your Code Here

**Explanation**

[Your explanation here]

**9. Discretization and Binning:**

Create a new categorical column called trade_amount_category by binning the trade_amount into "Low", "Medium", and "High" categories.

In [None]:
# Discretization and Binning:
# Create a new categorical column called trade_amount_category by binning the trade_amount into "Low", "Medium",
# and "High" categories.
# Your Code Here

**Explanation**

[Your explanation here]

**10. Outlier Detection:**

Use the IQR method to identify outliers in the trade_amount column.

In [None]:
# Outlier Detection:
# Use the IQR method to identify outliers in the trade_amount column.
# Your Code Here

##**Advanced (RBT Levels: 4, 5):**

Total: 15 Marks

Each Question Carry 3 Marks

**Explanation**

[Your explanation here]

**11. Grouped Aggregation:**

Group exports by exporter_country and calculate the total trade_amount for each country.

In [None]:
# Grouped Aggregation:
# Group exports by exporter_country and calculate the total trade_amount for each country.
# Your Code Here

**Explanation**

[Your explanation here]

**12. Grouped Transformation:**

Normalize the trade_amount within each product_category using z-scores.

In [None]:
# Grouped Transformation:
# Normalize the trade_amount within each product_category using z-scores.
# Your Code Here

**Explanation**

[Your explanation here]

**13. Time Series Analysis:**

Convert export_date and import_date to datetime objects.
Group exports by month and calculate the average trade_amount for each month.

In [None]:
# Time Series Analysis:
# Convert export_date and import_date to datetime objects.
# Group exports by month and calculate the average trade_amount for each month.
# Your Code Here

**Explanation**

[Your explanation here]

**14. Correlation Analysis:**
- Calculate the correlation between trade_amount and the number of characters in product_category.
- Calculate the correlation between export and import trade values.

In [None]:
# Correlation Analysis:
# Calculate the correlation between trade_amount and the number of characters in product_category.
# Calculate the correlation between export and import trade values.
# Your Code Here

**Explanation**

[Your explanation here]

**15. Conditional Logic and Feature Engineering:**

Create a new column called high_trade_country that indicates whether a country's total trade amount (exports + imports) is above a certain threshold.

In [None]:
# Conditional Logic and Feature Engineering:
# Create a new column called high_trade_country that indicates whether a country's
# total trade amount (exports + imports) is above a certain threshold.
# Your Code Here

**Explanation**

[Your explanation here]

Report:

Part 1: Compile the explanations for each question.
Part 2: Answer the following data analysis questions:
  1. What are the key characteristics of the export and import data?
  2. What are the main trends in international trade?
  3. How do product categories influence trade amounts?
  4. Identify and discuss any potential data quality issues.
  5. Provide 2-3 actionable business or policy insights based on your analysis.