# Module: Pandas Assignments
## Lesson: Pandas
### Assignment 1: DataFrame Creation and Indexing

1. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Set the index to be the first column.
2. Create a Pandas DataFrame with columns 'A', 'B', 'C' and index 'X', 'Y', 'Z'. Fill the DataFrame with random integers and access the element at row 'Y' and column 'B'.

### Assignment 2: DataFrame Operations

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Add a new column that is the product of the first two columns.
2. Create a Pandas DataFrame with 3 columns and 4 rows filled with random integers. Compute the row-wise and column-wise sum.

### Assignment 3: Data Cleaning

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Introduce some NaN values. Fill the NaN values with the mean of the respective columns.
2. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Introduce some NaN values. Drop the rows with any NaN values.

### Assignment 4: Data Aggregation

1. Create a Pandas DataFrame with 2 columns: 'Category' and 'Value'. Fill the 'Category' column with random categories ('A', 'B', 'C') and the 'Value' column with random integers. Group the DataFrame by 'Category' and compute the sum and mean of 'Value' for each category.
2. Create a Pandas DataFrame with 3 columns: 'Product', 'Category', and 'Sales'. Fill the DataFrame with random data. Group the DataFrame by 'Category' and compute the total sales for each category.

### Assignment 5: Merging DataFrames

1. Create two Pandas DataFrames with a common column. Merge the DataFrames using the common column.
2. Create two Pandas DataFrames with different columns. Concatenate the DataFrames along the rows and along the columns.

### Assignment 6: Time Series Analysis

1. Create a Pandas DataFrame with a datetime index and one column filled with random integers. Resample the DataFrame to compute the monthly mean of the values.
2. Create a Pandas DataFrame with a datetime index ranging from '2021-01-01' to '2021-12-31' and one column filled with random integers. Compute the rolling mean with a window of 7 days.

### Assignment 7: MultiIndex DataFrame

1. Create a Pandas DataFrame with a MultiIndex (hierarchical index). Perform some basic indexing and slicing operations on the MultiIndex DataFrame.
2. Create a Pandas DataFrame with MultiIndex consisting of 'Category' and 'SubCategory'. Fill the DataFrame with random data and compute the sum of values for each 'Category' and 'SubCategory'.

### Assignment 8: Pivot Tables

1. Create a Pandas DataFrame with columns 'Date', 'Category', and 'Value'. Create a pivot table to compute the sum of 'Value' for each 'Category' by 'Date'.
2. Create a Pandas DataFrame with columns 'Year', 'Quarter', and 'Revenue'. Create a pivot table to compute the mean 'Revenue' for each 'Quarter' by 'Year'.

### Assignment 9: Applying Functions

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Apply a function that doubles the values of the DataFrame.
2. Create a Pandas DataFrame with 3 columns and 6 rows filled with random integers. Apply a lambda function to create a new column that is the sum of the existing columns.

### Assignment 10: Working with Text Data

1. Create a Pandas Series with 5 random text strings. Convert all the strings to uppercase.
2. Create a Pandas Series with 5 random text strings. Extract the first three characters of each string.


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

### Assignment 1: DataFrame Creation and Indexing

#### 1. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Set the index to be the first column.

In [5]:
df = pd.DataFrame(np.random.randint(1,50 , size=(6,4)),columns=['A','B','C','D'])
print("Original DataFrame : ")
print(df)

df.set_index('A',inplace=True)
print("Modified DataFrame : ")
print(df)


Original DataFrame : 
    A   B   C   D
0  12  46  44  19
1  26   2  27  35
2  20  21  27   6
3   6  14   1  32
4  37  39  46  27
5  28  41  12  24
Modified DataFrame : 
     B   C   D
A             
12  46  44  19
26   2  27  35
20  21  27   6
6   14   1  32
37  39  46  27
28  41  12  24


#### 2. Create a Pandas DataFrame with columns 'A', 'B', 'C' and index 'X', 'Y', 'Z'. Fill the DataFrame with random integers and access the element at row 'Y' and column 'B'.

In [10]:
df = pd.DataFrame(np.random.randint(1,9,size=(3,3)),columns=['A','B','C'], index=['X','Y','Z'])
print(df)
print(df.at['Y','B'])

   A  B  C
X  3  7  7
Y  4  3  3
Z  7  2  3
3


### Assignment 2: DataFrame Operations

#### 1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Add a new column that is the product of the first two columns.

In [11]:
df = pd.DataFrame(np.random.randint(1,20,size=(5,3)))
df["Product First Two"] = df[0]*df[1]
df

Unnamed: 0,0,1,2,Product First Two
0,16,9,6,144
1,16,12,1,192
2,5,15,2,75
3,9,17,4,153
4,19,1,10,19


#### 2. Create a Pandas DataFrame with 3 columns and 4 rows filled with random integers. Compute the row-wise and column-wise sum.

In [12]:
df = pd.DataFrame(np.random.randint(1,20,size=(4,3)))
row_sum = df.sum(axis=1)
col_sum = df.sum(axis = 0)
print("DataFrame : ")
print(df)
print("Rows Sum : ")
print(row_sum)
print("Column Sum : ")
print(col_sum)

DataFrame : 
    0   1   2
0  13  10  16
1   5   8  14
2  14   3   4
3  15  18   7
Rows Sum : 
0    39
1    27
2    21
3    40
dtype: int64
Column Sum : 
0    47
1    39
2    41
dtype: int64


### Assignment 3: Data Cleaning

#### 1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Introduce some NaN values. Fill the NaN values with the mean of the respective columns.

In [26]:
df = pd.DataFrame(np.random.randint(1,20,size=(5,3)), columns = ['A','B','C'])
print("Original DataFrame : ")
print(df)
df.iat[0,2] = np.nan
df.iat[1,1] = np.nan
df.iat[2,0] = np.nan
df.iat[4,2] = np.nan
print("DF with nan values : ")
print(df)
df.fillna(df.mean(), inplace = True)
print("Modified DF : ")
print(df)

Original DataFrame : 
    A   B   C
0  14   9  16
1  17   2   7
2   2   4  16
3   5   3   9
4  14  18  19
DF with nan values : 
      A     B     C
0  14.0   9.0   NaN
1  17.0   NaN   7.0
2   NaN   4.0  16.0
3   5.0   3.0   9.0
4  14.0  18.0   NaN
Modified DF : 
      A     B          C
0  14.0   9.0  10.666667
1  17.0   8.5   7.000000
2  12.5   4.0  16.000000
3   5.0   3.0   9.000000
4  14.0  18.0  10.666667


#### 2. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Introduce some NaN values. Drop the rows with any NaN values.

In [None]:
df = pd.DataFrame(np.random.randint(1,100, size=(4,6)), columns = ['A','B','C','D','E','F'])
print("Original DF : ")
print(df)

df.iat[0,5] = np.nan
df.iat[2,4] = np.nan
print("DF with NaN values : ")
print(df)

df.dropna(inplace=True)
print("Modified DF : ")
print(df)

Original DF : 
    A   B   C   D   E   F
0  26  74  30  25   4  85
1  17  76  56  56  94  26
2  76  98  49  66  53  85
3  25  85  92   5  68  11
DF with NaN values : 
    A   B   C   D     E     F
0  26  74  30  25   4.0   NaN
1  17  76  56  56  94.0  26.0
2  76  98  49  66   NaN  85.0
3  25  85  92   5  68.0  11.0
Modified DF : 
    A   B   C   D
0  26  74  30  25
1  17  76  56  56
2  76  98  49  66
3  25  85  92   5


### Assignment 4: Data Aggregation

#### 1. Create a Pandas DataFrame with 2 columns: 'Category' and 'Value'. Fill the 'Category' column with random categories ('A', 'B', 'C') and the 'Value' column with random integers. Group the DataFrame by 'Category' and compute the sum and mean of 'Value' for each category.

In [15]:
df = pd.DataFrame(
    {
        "Categories" : np.random.choice(['A','B','C'],size = 10),
        "Values" : np.random.choice([i for i in range(100)],size = 10)
    }
)
grouped_sum = df.groupby('Categories')['Values'].sum()
grouped_mean = df.groupby('Categories')['Values'].mean()
print(grouped_sum)
print(grouped_mean)

Categories
A    130
B    119
C    177
Name: Values, dtype: int64
Categories
A    43.333333
B    59.500000
C    35.400000
Name: Values, dtype: float64


#### 2. Create a Pandas DataFrame with 3 columns: 'Product', 'Category', and 'Sales'. Fill the DataFrame with random data. Group the DataFrame by 'Category' and compute the total sales for each category.

In [25]:
df = pd.DataFrame(
    {
        "Product" : np.random.choice(['Product1','Product2','Product3'], size = 5),
        "Category" : np.random.choice(['A','B','C'], size = 5),
        "Sales" : np.random.choice([i for i in range(5000, 10001)],size = 5),
    }
)
print("Original DF : ")
print(df)
category_sales = df.groupby('Category')['Sales'].sum()
print(category_sales)

Original DF : 
    Product Category  Sales
0  Product2        B   7568
1  Product1        B   5079
2  Product1        A   5780
3  Product2        B   6202
4  Product1        C   8812
Category
A     5780
B    18849
C     8812
Name: Sales, dtype: int64


### Assignment 5: Merging DataFrames

#### 1. Create two Pandas DataFrames with a common column. Merge the DataFrames using the common column.

In [None]:
df1 = pd.DataFrame(
    {
        "Product" : np.random.choice(['Product1','Product2','Product3'], size = 5),
        "Categories" : np.random.choice(['A','B','C'], size = 5),
        "Sales" : np.random.choice([i for i in range(5000, 10001)],size = 5),
    }
)
df2 = pd.DataFrame(
    {
        "Categories" : np.random.choice(['A','B','C'],size = 10),
        "Values" : np.random.choice([i for i in range(100)],size = 10)
    }
)
merged_df = pd.merge(df1, df2,on='Categories')
print(merged_df)

     Product Categories  Sales  Values
0   Product3          B   5792      93
1   Product3          B   5792      21
2   Product3          B   5792      66
3   Product1          C   5503      60
4   Product1          C   5503      88
5   Product1          C   5503      78
6   Product2          B   9313      93
7   Product2          B   9313      21
8   Product2          B   9313      66
9   Product3          C   5649      60
10  Product3          C   5649      88
11  Product3          C   5649      78
12  Product1          C   5089      60
13  Product1          C   5089      88
14  Product1          C   5089      78


#### 2. Create two Pandas DataFrames with different columns. Concatenate the DataFrames along the rows and along the columns.

In [37]:
df1 = pd.DataFrame({'A': np.random.randint(1, 100, size=3), 'B': np.random.randint(1, 100, size=3)})
df2 = pd.DataFrame({'C': np.random.randint(1, 100, size=3), 'D': np.random.randint(1, 100, size=3)})
print("Original DF1 : ")
print(df1)
print("Original DF2 : ")
print(df2)
row_concate = pd.concat([df1,df2],axis=0)
print("Concatenation along Rows : ")
print(row_concate)
col_concate = pd.concat([df1,df2],axis=1)
print("Concatenation along Columns : ")
print(col_concate)

Original DF1 : 
    A   B
0  66  96
1   2  66
2  75  27
Original DF2 : 
    C   D
0  99  57
1  34  90
2  72  19
Concatenation along Rows : 
      A     B     C     D
0  66.0  96.0   NaN   NaN
1   2.0  66.0   NaN   NaN
2  75.0  27.0   NaN   NaN
0   NaN   NaN  99.0  57.0
1   NaN   NaN  34.0  90.0
2   NaN   NaN  72.0  19.0
Concatenation along Columns : 
    A   B   C   D
0  66  96  99  57
1   2  66  34  90
2  75  27  72  19


### Assignment 6: Time Series Analysis

#### 1. Create a Pandas DataFrame with a datetime index and one column filled with random integers. Resample the DataFrame to compute the monthly mean of the values.

In [None]:
date_range = pd.date_range(start='2005-01-01', end = '2005-12-31', freq='D')
df = pd.DataFrame(date_range, columns = ['date'])
df.set_index('date', inplace=True)
df['data'] = np.random.randint(1,1000, size = (len(date_range)))
print("Original DataFrame : ")
print(df)

monthly_mean = df.resample('M').mean()
print(monthly_mean)

Original DataFrame : 
            data
date            
2005-01-01   385
2005-01-02   567
2005-01-03    54
2005-01-04   382
2005-01-05   382
...          ...
2005-12-27   377
2005-12-28    62
2005-12-29   178
2005-12-30   978
2005-12-31   591

[365 rows x 1 columns]
                  data
date                  
2005-01-31  464.322581
2005-02-28  560.035714
2005-03-31  506.000000
2005-04-30  555.800000
2005-05-31  457.387097
2005-06-30  570.300000
2005-07-31  441.096774
2005-08-31  460.774194
2005-09-30  440.766667
2005-10-31  532.870968
2005-11-30  463.000000
2005-12-31  523.612903
<class 'pandas.core.frame.DataFrame'>


  monthly_mean = df.resample('M').mean()


#### 2. Create a Pandas DataFrame with a datetime index ranging from '2021-01-01' to '2021-12-31' and one column filled with random integers. Compute the rolling mean with a window of 7 days.

In [50]:
date_range = pd.date_range(start = '2021-01-01',end = '2021-12-31',freq='D')
df = pd.DataFrame(date_range, columns=['Date'])
df.set_index('Date',inplace=True)
df['data'] = np.random.randint(1,100, size=(len(date_range)))
print("Original DataFrame : ")
print(df)

rolling_mean = df.rolling(window=7).mean()
print(rolling_mean)

Original DataFrame : 
            data
Date            
2021-01-01    25
2021-01-02    15
2021-01-03    99
2021-01-04    73
2021-01-05    74
...          ...
2021-12-27    37
2021-12-28    83
2021-12-29    71
2021-12-30    39
2021-12-31    87

[365 rows x 1 columns]
                 data
Date                 
2021-01-01        NaN
2021-01-02        NaN
2021-01-03        NaN
2021-01-04        NaN
2021-01-05        NaN
...               ...
2021-12-27  60.000000
2021-12-28  59.857143
2021-12-29  59.714286
2021-12-30  52.142857
2021-12-31  59.428571

[365 rows x 1 columns]


### Assignment 7: MultiIndex DataFrame

#### 1. Create a Pandas DataFrame with a MultiIndex (hierarchical index). Perform some basic indexing and slicing operations on the MultiIndex DataFrame.

In [65]:
arrays = [['A','A','B','B','C','C'],['one','two','one','two','one','two']]
index = pd.MultiIndex.from_arrays(arrays,names=('Categories', 'Subcategories'))
df = pd.DataFrame(np.random.randint(1,50, size=(6,3)), index = index, columns=['Value1','Value2','Value3'])
print(df)

print("Indexing at A")
print(df.loc['A'])

print("Slicing at C")
print(df.loc[('C','two')])

                          Value1  Value2  Value3
Categories Subcategories                        
A          one                26      38      38
           two                 4      15      46
B          one                23      30      25
           two                21      30       5
C          one                44      33      38
           two                14       6      30
Indexing at A
               Value1  Value2  Value3
Subcategories                        
one                26      38      38
two                 4      15      46
Slicing at C
Value1    14
Value2     6
Value3    30
Name: (C, two), dtype: int32


#### 2. Create a Pandas DataFrame with MultiIndex consisting of 'Category' and 'SubCategory'. Fill the DataFrame with random data and compute the sum of values for each 'Category' and 'SubCategory'.

In [80]:
tuples = [['A','A','A','B','B','B'],[1,2,3,1,2,3]]
index = pd.MultiIndex.from_arrays(tuples,names=('Category','Subcategory'))
df = pd.DataFrame(np.random.randint(100,200,size = (6,3)), index = index, columns = ['Value1','Value2','Value3'])
print(df)
grouped_sum = df.groupby(['Category','Subcategory'])[['Value1','Value2','Value3']].sum()
print(grouped_sum)

                      Value1  Value2  Value3
Category Subcategory                        
A        1               148     188     114
         2               108     176     170
         3               113     189     100
B        1               104     106     126
         2               192     114     100
         3               182     196     163
                      Value1  Value2  Value3
Category Subcategory                        
A        1               148     188     114
         2               108     176     170
         3               113     189     100
B        1               104     106     126
         2               192     114     100
         3               182     196     163


### Assignment 8: Pivot Tables

#### 1. Create a Pandas DataFrame with columns 'Date', 'Category', and 'Value'. Create a pivot table to compute the sum of 'Value' for each 'Category' by 'Date'.

#### 2. Create a Pandas DataFrame with columns 'Year', 'Quarter', and 'Revenue'. Create a pivot table to compute the mean 'Revenue' for each 'Quarter' by 'Year'.


### Assignment 9: Applying Functions

#### 1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Apply a function that doubles the values of the DataFrame.

In [82]:
df = pd.DataFrame(np.random.randint(1,50,size=(5,3)),index=['A','B','C','D','E'],columns=['Value1','Value2','Value3'])
print(df)
doubled_df = df.applymap(lambda x : x*2)
print(doubled_df)

   Value1  Value2  Value3
A      27      27       7
B      21      14       1
C      46      40      26
D      25       9      34
E      34       3      17
   Value1  Value2  Value3
A      54      54      14
B      42      28       2
C      92      80      52
D      50      18      68
E      68       6      34


  doubled_df = df.applymap(lambda x : x*2)


#### 2. Create a Pandas DataFrame with 3 columns and 6 rows filled with random integers. Apply a lambda function to create a new column that is the sum of the existing columns.

In [89]:
df = pd.DataFrame(np.random.randint(1,50,size=(6,3)),index=['A','B','C','D','E','F'],columns=['Value1','Value2','Value3'])
print(df)
df['Sum'] = df.apply(lambda column : column.sum(),axis=1)
print(df)

   Value1  Value2  Value3
A      44       2       6
B      31      35      33
C      42      36      32
D      35       7      27
E      45      34      35
F      47      36      43
   Value1  Value2  Value3  Sum
A      44       2       6   52
B      31      35      33   99
C      42      36      32  110
D      35       7      27   69
E      45      34      35  114
F      47      36      43  126


### Assignment 10: Working with Text Data

#### 1. Create a Pandas Series with 5 random text strings. Convert all the strings to uppercase.

In [91]:
text_data = pd.Series(["NumPy","Pandas","Matplotlib","Seaborn","Random"])
print(text_data)
uppercase_data = text_data.str.upper()
print(uppercase_data)

0         NumPy
1        Pandas
2    Matplotlib
3       Seaborn
4        Random
dtype: object
0         NUMPY
1        PANDAS
2    MATPLOTLIB
3       SEABORN
4        RANDOM
dtype: object


#### 2. Create a Pandas Series with 5 random text strings. Extract the first three characters of each string.

In [94]:
text_data = pd.Series(["NumPy","Pandas","Matplotlib","Seaborn","Random"])
print(text_data)
sliced_data = text_data.str[:3]
print(sliced_data)

0         NumPy
1        Pandas
2    Matplotlib
3       Seaborn
4        Random
dtype: object
0    Num
1    Pan
2    Mat
3    Sea
4    Ran
dtype: object
