## Part 1: Data Manipulation with Pandas (10-15 minutes)
##### Real-World Scenario: Sales Data Analysis
##### You have sales data from a small electronics store. Your task is to perform some key data analysis tasks that would help identify trends and insights.

- Task 1: Load the following sales data into a pandas DataFrame:
- Task 2: Convert the Date column to datetime format.
- Task 3: Add a new column Revenue that is calculated as Price * Units_Sold.
- Task 4: Filter the DataFrame to show only records where the Revenue is greater than 5,000.
- Task 5: Calculate the total revenue for each Store_Location using a groupby operation.

In [1]:
import pandas as pd

data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
    'Product': ['Laptop', 'Tablet', 'Smartphone', 'Monitor', 'Laptop'],
    'Price': [1200, 300, 800, 400, 1200],
    'Units_Sold': [10, 25, 15, 7, 8],
    'Store_Location': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles']
}

# Task 1:
df = pd.DataFrame(data)

# Task 2:
df['Date'] = pd.to_datetime(df['Date'])

# Task 3:
df['Revenue'] = df['Price']*df['Units_Sold']

# Task 4:
df_task4 = df[df['Revenue'] > 5000]

print(df_task4)

# Task 5:
df_task5 = df.groupby('Store_Location')['Revenue'].sum()

print(df_task5)


        Date     Product  Price  Units_Sold Store_Location  Revenue
0 2023-01-01      Laptop   1200          10       New York    12000
1 2023-01-02      Tablet    300          25    Los Angeles     7500
2 2023-01-03  Smartphone    800          15       New York    12000
4 2023-01-05      Laptop   1200           8    Los Angeles     9600
Store_Location
Chicago         2800
Los Angeles    17100
New York       24000
Name: Revenue, dtype: int64


## Part 2: Numpy Operations and Matrix Algebra (5-10 minutes)
#### Real-World Scenario: Linear Algebra in Machine Learning
#### You are working with data that requires linear algebraic operations, which is common in machine learning, such as when dealing with transformation matrices or linear regression coefficients.

- Task 1: Create the following two matrices using numpy:

A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
B = [[10, 11, 12], [13, 14, 15], [16, 17, 18]]

- Task 2: Compute the dot product of A and B.T (the transpose of B).
- Task 3: Compute the inverse of a 2x2 matrix C:
- Task 4: Use np.exp() to apply the exponential function to all elements of matrix A.


In [3]:
import numpy as np

# Task 1
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
print(f'NumPy Task 1 A = {A} , NumPy Task 1 B = {B}')

# Task 2
np_task_2 = np.dot(A, B.T)
print(f'NumPy Task 2 = {np_task_2}')

# Task 3
C = np.array([[1, 2], [3, 4]])
np_task_3 = np.linalg.inv(C)
print(f'NumPy Task 3 = {np_task_3}')

# Task 4
np_task_4 = np.exp(A)
print(f'NumPy Task 4 = {np_task_4}')

NumPy Task 1 A = [[1 2 3]
 [4 5 6]
 [7 8 9]] , NumPy Task 1 B = [[10 11 12]
 [13 14 15]
 [16 17 18]]
NumPy Task 2 = [[ 68  86 104]
 [167 212 257]
 [266 338 410]]
NumPy Task 3 = [[-2.   1. ]
 [ 1.5 -0.5]]
NumPy Task 4 = [[2.71828183e+00 7.38905610e+00 2.00855369e+01]
 [5.45981500e+01 1.48413159e+02 4.03428793e+02]
 [1.09663316e+03 2.98095799e+03 8.10308393e+03]]


## Part 3: Machine Learning Model with Scikit-learn (10 minutes)
#### Real-World Scenario: Predicting House Prices
#### You are building a model to predict house prices based on certain features such as square footage, number of bedrooms, and age of the house. For simplicity, let's simulate some data and build a basic linear regression model.

- Task 1: Create the following feature matrix (X) and target variable (y):

- Task 2: Create and fit a LinearRegression model on this data.

- Task 3: Use the model to predict the price of a house with the following features: 2100 square feet, 4 bedrooms, and 15 years old.

- Task 4: Print the model's intercept and coefficients.

- Task 5: Calculate and print the R-squared score of the model. (Use train data just for practice)

In [4]:
from sklearn.linear_model import LinearRegression

# Task 1
X = np.array([[1500, 3, 20], [1800, 4, 15], [2400, 5, 10], [1700, 3, 25], [2000, 4, 20]])
y = np.array([300000, 400000, 500000, 350000, 450000])

# Task 2
lr = LinearRegression()
lr.fit(X, y)

# Task 3
skl_task3 = lr.predict(np.array([[2100, 4, 5]]))
print(f'SKL Task 3 = {skl_task3}')

# Task 4
print(f'Task 4: Intercept = {lr.intercept_}, Coeff = {lr.coef_}')

# Task 5
print(f'Task 5: R2 Score = {lr.score(X,y)}')

SKL Task 3 = [300000.]
Task 4: Intercept = -350000.0000000007, Coeff = [2.12714566e-13 1.50000000e+05 1.00000000e+04]
Task 5: R2 Score = 1.0
