# Module: Pandas Assignments
## Lesson: Pandas
### Assignment 1: DataFrame Creation and Indexing

1. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Set the index to be the first column.
2. Create a Pandas DataFrame with columns 'A', 'B', 'C' and index 'X', 'Y', 'Z'. Fill the DataFrame with random integers and access the element at row 'Y' and column 'B'.

### Assignment 2: DataFrame Operations

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Add a new column that is the product of the first two columns.
2. Create a Pandas DataFrame with 3 columns and 4 rows filled with random integers. Compute the row-wise and column-wise sum.

### Assignment 3: Data Cleaning

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Introduce some NaN values. Fill the NaN values with the mean of the respective columns.
2. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Introduce some NaN values. Drop the rows with any NaN values.

### Assignment 4: Data Aggregation

1. Create a Pandas DataFrame with 2 columns: 'Category' and 'Value'. Fill the 'Category' column with random categories ('A', 'B', 'C') and the 'Value' column with random integers. Group the DataFrame by 'Category' and compute the sum and mean of 'Value' for each category.
2. Create a Pandas DataFrame with 3 columns: 'Product', 'Category', and 'Sales'. Fill the DataFrame with random data. Group the DataFrame by 'Category' and compute the total sales for each category.

### Assignment 5: Merging DataFrames

1. Create two Pandas DataFrames with a common column. Merge the DataFrames using the common column.
2. Create two Pandas DataFrames with different columns. Concatenate the DataFrames along the rows and along the columns.

### Assignment 6: Time Series Analysis

1. Create a Pandas DataFrame with a datetime index and one column filled with random integers. Resample the DataFrame to compute the monthly mean of the values.
2. Create a Pandas DataFrame with a datetime index ranging from '2021-01-01' to '2021-12-31' and one column filled with random integers. Compute the rolling mean with a window of 7 days.

### Assignment 7: MultiIndex DataFrame

1. Create a Pandas DataFrame with a MultiIndex (hierarchical index). Perform some basic indexing and slicing operations on the MultiIndex DataFrame.
2. Create a Pandas DataFrame with MultiIndex consisting of 'Category' and 'SubCategory'. Fill the DataFrame with random data and compute the sum of values for each 'Category' and 'SubCategory'.

### Assignment 8: Pivot Tables

1. Create a Pandas DataFrame with columns 'Date', 'Category', and 'Value'. Create a pivot table to compute the sum of 'Value' for each 'Category' by 'Date'.
2. Create a Pandas DataFrame with columns 'Year', 'Quarter', and 'Revenue'. Create a pivot table to compute the mean 'Revenue' for each 'Quarter' by 'Year'.

### Assignment 9: Applying Functions

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Apply a function that doubles the values of the DataFrame.
2. Create a Pandas DataFrame with 3 columns and 6 rows filled with random integers. Apply a lambda function to create a new column that is the sum of the existing columns.

### Assignment 10: Working with Text Data

1. Create a Pandas Series with 5 random text strings. Convert all the strings to uppercase.
2. Create a Pandas Series with 5 random text strings. Extract the first three characters of each string.


In [101]:
import pandas as pd
import numpy as np

# Module: Pandas Assignments
## Lesson: Pandas
### Assignment 1: DataFrame Creation and Indexing

1. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Set the index to be the first column.

In [102]:


data=pd.DataFrame(np.random.randint(1,21,size=(6,4)),columns=['A','B','C','D'])
print("Original DataFrame \n",data)

Original DataFrame 
     A   B   C   D
0  19  19   7   6
1  17   5  14  10
2   7  12   8   5
3  14  14  14   8
4  18  13  18   1
5   6  15  13   1


In [103]:
data=pd.DataFrame(np.random.randint(1,21,size=(6,4)),columns=['A','B','C','D'],index=['one','two','three','four','five','six'])
print("Original DataFrame \n",data)

Original DataFrame 
         A   B   C   D
one     3   4  10   4
two     9   4   2  19
three   7   2   2  11
four   15  17  19   4
five    9  15  14   7
six    12   6  16  12


In [104]:

data=pd.DataFrame(np.random.randint(1,21,size=(6,4)),columns=['A','B','C','D'])
print("Original DataFrame \n",data)

Original DataFrame 
     A   B   C   D
0   7   8  13  19
1  15   4   2  20
2   4  16  13  14
3  12  19   1   4
4  19  14   9   7
5  20  12  17   9


In [105]:
data.set_index('A',inplace=True)#Sets the elements of column A as index of the rows,basically reduces one column and sets the index
print("After setting index \n",data)

After setting index 
      B   C   D
A             
7    8  13  19
15   4   2  20
4   16  13  14
12  19   1   4
19  14   9   7
20  12  17   9


In [106]:
print(data.shape)#Now columns are reduced to 3

(6, 3)


2. Create a Pandas DataFrame with columns 'A', 'B', 'C' and index 'X', 'Y', 'Z'. Fill the DataFrame with random integers and access the element at row 'Y' and column 'B'.

In [107]:
data=pd.DataFrame(np.random.randint(1,21,size=(3,3)),columns=list("ABC"),index=list("XYZ"))
print("Original DataFrame \n",data)

Original DataFrame 
     A   B   C
X  15  19   1
Y  19  20   6
Z   1  15  19


In [108]:
data.loc["Y","B"]

np.int64(20)

In [109]:
print(data.loc["Y","B"])

20


In [110]:
data.at["Y","B"]

np.int64(20)

### Assignment 2: DataFrame Operations

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Add a new column that is the product of the first two columns.

In [111]:
data=pd.DataFrame(np.random.randint(1,21,size=(5,3)))
print("Original DataFrame \n",data)

Original DataFrame 
     0   1   2
0  16   1   3
1   7   8  19
2   6   5   5
3  12  16  11
4   6   6   2


In [112]:
data["New column"]=data[0]*data[1]
print("After adding new column \n",data)

After adding new column 
     0   1   2  New column
0  16   1   3          16
1   7   8  19          56
2   6   5   5          30
3  12  16  11         192
4   6   6   2          36


2. Create a Pandas DataFrame with 3 columns and 4 rows filled with random integers. Compute the row-wise and column-wise sum.

In [113]:
data=pd.DataFrame(np.random.randint(1,21,size=(4,3)),columns=list("ABC"))
print("Original DataFrame \n",data)


Original DataFrame 
     A   B   C
0   5  19  11
1   9   4   9
2  19   7   2
3  10   2   6


In [114]:
row_sum=data.sum(axis=1)#horizontal sum
column_sum=data.sum(axis=0)#vertical sum

print("Row sum \n",row_sum)
print("Column sum \n",column_sum)

Row sum 
 0    35
1    22
2    28
3    18
dtype: int64
Column sum 
 A    43
B    32
C    28
dtype: int64


### Assignment 3: Data Cleaning

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Introduce some NaN values. Fill the NaN values with the mean of the respective columns.

In [115]:
data=pd.DataFrame(np.random.randint(1,21,size=(5,3)),columns=list("ABC"))
print("Original DataFrame \n",data)

Original DataFrame 
     A   B   C
0  12   5  15
1  13   7  12
2   8   2   9
3   8  14   4
4   4  14   3


In [116]:
data.iloc[0,0]=np.nan
data.iloc[1,1]=np.nan
data.iloc[2,2]=np.nan
data.iloc[3,0]=np.nan
data.iloc[4,1]=np.nan
print("After adding NaN values \n",data)

After adding NaN values 
       A     B     C
0   NaN   5.0  15.0
1  13.0   NaN  12.0
2   8.0   2.0   NaN
3   NaN  14.0   4.0
4   4.0   NaN   3.0


In [117]:
data.fillna(data.mean(),inplace=True)#Fills each NaN value with the mean of that column 
print("After filling NaN values with mean \n",data)

After filling NaN values with mean 
            A     B     C
0   8.333333   5.0  15.0
1  13.000000   7.0  12.0
2   8.000000   2.0   8.5
3   8.333333  14.0   4.0
4   4.000000   7.0   3.0


In [118]:
print(data.mean())

A    8.333333
B    7.000000
C    8.500000
dtype: float64


2. Create a Pandas DataFrame with 4 columns and 6 rows filled with random integers. Introduce some NaN values. Drop the rows with any NaN values.

In [119]:
data=pd.DataFrame(np.random.randint(1,21,size=(6,4)),columns=list("ABCD"))
print("Original DataFrame \n",data)

Original DataFrame 
     A   B   C   D
0  13   9   8  19
1  15   1  16  15
2   1   7   1  14
3   1   9  10   3
4  18  18   8  17
5  13   3  18   7


In [120]:
data.loc[0:2,"A"]=np.nan
data.loc[3,"B"]=np.nan
data.loc[4:5,"C"]=np.nan
data.loc[1,"D"]=np.nan
print("After adding NaN values \n",data)


After adding NaN values 
       A     B     C     D
0   NaN   9.0   8.0  19.0
1   NaN   1.0  16.0   NaN
2   NaN   7.0   1.0  14.0
3   1.0   NaN  10.0   3.0
4  18.0  18.0   NaN  17.0
5  13.0   3.0   NaN   7.0


In [121]:
data.isnull()

Unnamed: 0,A,B,C,D
0,True,False,False,False
1,True,False,False,True
2,True,False,False,False
3,False,True,False,False
4,False,False,True,False
5,False,False,True,False


In [122]:
data.isnull().any()

A    True
B    True
C    True
D    True
dtype: bool

In [123]:
data.isnull().any(axis=1)

0    True
1    True
2    True
3    True
4    True
5    True
dtype: bool

In [124]:
data.isnull().sum()

A    3
B    1
C    2
D    1
dtype: int64

In [125]:
data.drop(data.isnull().any(axis=1),inplace=True)#Very wrong
print("After dropping rows with NaN values \n",data)

KeyError: '[True, True, True, True, True, True] not found in axis'

In [42]:
data.dropna(inplace=True)#Drops all rows with NaN values
print("After dropping rows with NaN values \n",data)

After dropping rows with NaN values 
 Empty DataFrame
Columns: [A, B, C, D]
Index: []


### Assignment 4: Data Aggregation

1. Create a Pandas DataFrame with 2 columns: 'Category' and 'Value'. Fill the 'Category' column with random categories ('A', 'B', 'C') and the 'Value' column with random integers. Group the DataFrame by 'Category' and compute the sum and mean of 'Value' for each category.

In [43]:
df=pd.DataFrame({'Column':np.random.choice(['A','B','C'],size=10),"Value":np.random.randint(1,21,size=10)})
print("Original DataFrame \n",df)

Original DataFrame 
   Column  Value
0      B      2
1      C     12
2      C      1
3      A      7
4      C      8
5      C     19
6      C     11
7      A     11
8      A      3
9      C      5


In [46]:
grouped=df.groupby('Column')['Value'].sum()
print(grouped)

Column
A    21
B     2
C    56
Name: Value, dtype: int64


2. Create a Pandas DataFrame with 3 columns: 'Product', 'Category', and 'Sales'. Fill the DataFrame with random data. Group the DataFrame by 'Category' and compute the total sales for each category.

In [47]:
df = pd.DataFrame({'Product': np.random.choice(['Prod1', 'Prod2', 'Prod3'], size=10), 'Category': np.random.choice(['A', 'B', 'C'], size=10), 'Sales': np.random.randint(1, 100, size=10)})
print("Original DataFrame:\n",df)

grouped=df.groupby('Category')['Sales'].sum()
print(grouped)

Original DataFrame:
   Product Category  Sales
0   Prod2        A     15
1   Prod2        B      2
2   Prod2        A     56
3   Prod1        A     59
4   Prod1        A     79
5   Prod2        A     34
6   Prod1        A     96
7   Prod3        C     28
8   Prod3        A     77
9   Prod2        B     95
Category
A    416
B     97
C     28
Name: Sales, dtype: int64


### Assignment 5: Merging DataFrames

1. Create two Pandas DataFrames with a common column. Merge the DataFrames using the common column.

In [65]:
df_1=pd.DataFrame({'key':np.random.choice(list("ABCD"),size=10),'Value-1':np.random.randint(1,21,size=10)})
df_2=pd.DataFrame({'key':np.random.choice(list("ABEF"),size=10),'Value-2':np.random.randint(20,41,size=10)})

print("Original DataFrame 1 \n",df_1)
print("Original DataFrame 2 \n",df_2)



Original DataFrame 1 
   key  Value-1
0   C        4
1   C       12
2   A       10
3   B        3
4   A        3
5   C       10
6   A       14
7   B       19
8   B       11
9   D        6
Original DataFrame 2 
   key  Value-2
0   E       29
1   F       20
2   F       39
3   B       28
4   A       39
5   E       24
6   F       23
7   F       34
8   E       35
9   E       22


In [66]:
merged=pd.merge(df_1,df_2,on='key',how="inner")
print("After merging \n",merged)

After merging 
   key  Value-1  Value-2
0   A       10       39
1   B        3       28
2   A        3       39
3   A       14       39
4   B       19       28
5   B       11       28


2. Create two Pandas DataFrames with different columns. Concatenate the DataFrames along the rows and along the columns.

In [67]:
df1=pd.DataFrame({'key':np.random.choice(list("ABCD"),size=5),'Value-1':np.random.randint(1,21,size=5)})
df2=pd.DataFrame({'key':np.random.choice(list("EFGH"),size=5),'Value-2':np.random.randint(20,41,size=5)})

print("Original DataFrame 1 \n",df1)
print("Original DataFrame 2 \n",df2)



Original DataFrame 1 
   key  Value-1
0   B       18
1   D        5
2   B       13
3   B        2
4   D        6
Original DataFrame 2 
   key  Value-2
0   G       32
1   G       22
2   H       32
3   E       39
4   F       40


In [68]:
print("Along the rows\n",pd.concat([df1,df2]))
print("Along the columns\n",pd.concat([df1,df2],axis=1))

Along the rows
   key  Value-1  Value-2
0   B     18.0      NaN
1   D      5.0      NaN
2   B     13.0      NaN
3   B      2.0      NaN
4   D      6.0      NaN
0   G      NaN     32.0
1   G      NaN     22.0
2   H      NaN     32.0
3   E      NaN     39.0
4   F      NaN     40.0
Along the columns
   key  Value-1 key  Value-2
0   B       18   G       32
1   D        5   G       22
2   B       13   H       32
3   B        2   E       39
4   D        6   F       40


In [69]:
df1=pd.DataFrame({'key-1':np.random.choice(list("ABCD"),size=5),'Value-1':np.random.randint(1,21,size=5)})
df2=pd.DataFrame({'key-2':np.random.choice(list("EFGH"),size=5),'Value-2':np.random.randint(20,41,size=5)})

print("Original DataFrame 1 \n",df1)
print("Original DataFrame 2 \n",df2)

Original DataFrame 1 
   key-1  Value-1
0     A       19
1     A       16
2     D        5
3     D       12
4     C        6
Original DataFrame 2 
   key-2  Value-2
0     F       28
1     G       28
2     F       36
3     F       31
4     H       30


In [70]:
print("Along the rows\n",pd.concat([df1,df2]))
print("Along the columns\n",pd.concat([df1,df2],axis=1))

Along the rows
   key-1  Value-1 key-2  Value-2
0     A     19.0   NaN      NaN
1     A     16.0   NaN      NaN
2     D      5.0   NaN      NaN
3     D     12.0   NaN      NaN
4     C      6.0   NaN      NaN
0   NaN      NaN     F     28.0
1   NaN      NaN     G     28.0
2   NaN      NaN     F     36.0
3   NaN      NaN     F     31.0
4   NaN      NaN     H     30.0
Along the columns
   key-1  Value-1 key-2  Value-2
0     A       19     F       28
1     A       16     G       28
2     D        5     F       36
3     D       12     F       31
4     C        6     H       30


In [71]:
# Just a try

df=pd.DataFrame([1,2,3,4,56,7],columns=['A','B','C','D','E','F'])

ValueError: Shape of passed values is (6, 1), indices imply (6, 6)

In [73]:
df=pd.DataFrame([1,2,3,4,56,7],columns=['A'])
print(df)

    A
0   1
1   2
2   3
3   4
4  56
5   7


In [74]:
np.random.randint(1,21,size=(6,4))

array([[ 4, 18,  5,  4],
       [16,  5,  5, 20],
       [11, 18,  6, 14],
       [18,  3,  9,  8],
       [ 8,  2,  8,  5],
       [ 7,  7, 15,  4]])

In [76]:
values=np.random.randint(1,21,size=(6,4))
print(values)
df=pd.DataFrame(values,columns=['A','B','C','D'])
print(df)

[[ 4 18 13  2]
 [ 4 17 13 10]
 [15  6 19 17]
 [11 16  4  4]
 [ 8  5 10 18]
 [11  6  2 16]]
    A   B   C   D
0   4  18  13   2
1   4  17  13  10
2  15   6  19  17
3  11  16   4   4
4   8   5  10  18
5  11   6   2  16


### Assignment 6: Time Series Analysis

1. Create a Pandas DataFrame with a datetime index and one column filled with random integers. Resample the DataFrame to compute the monthly mean of the values.

The code demonstrates how to create a time series DataFrame, set the date as the index, and then resample it to compute the monthly mean of the data. Here's a detailed explanation:

---

### Step-by-Step Breakdown:

#### 1. **Creating a Date Range**
```python
date_rng = pd.date_range(start='2022-01-01', end='2022-12-31', freq='D')
```
- `pd.date_range`: Generates a sequence of dates.
- `start='2022-01-01'` and `end='2022-12-31'`: Define the range from January 1, 2022, to December 31, 2022.
- `freq='D'`: Specifies daily frequency.

#### 2. **Creating the DataFrame**
```python
df = pd.DataFrame(date_rng, columns=['date'])
```
- A DataFrame is created with a single column, `date`, containing the daily dates generated.

#### 3. **Adding Random Data**
```python
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))
```
- A new column, `data`, is added to the DataFrame.
- `np.random.randint(0, 100, size=(len(date_rng)))`: Generates random integers between 0 and 99, with the same length as the date range.

#### 4. **Setting the Date as the Index**
```python
df.set_index('date', inplace=True)
```
- `set_index`: Sets the `date` column as the index, converting the DataFrame into a time series format.

#### Original DataFrame Example:
```
            data
date            
2022-01-01    45
2022-01-02    78
2022-01-03    12
...
2022-12-31    56
```

---

#### 5. **Resampling to Monthly Frequency**
```python
monthly_mean = df.resample('M').mean()
```
- `resample('M')`: Groups the data by the end of each month (`'M'` stands for month-end frequency).
- `.mean()`: Computes the mean of the `data` values for each month.

#### Resampled DataFrame (Monthly Mean):
The resulting `monthly_mean` DataFrame contains one row for each month, with the mean of the `data` values for all days in that month.

Example:
```
                 data
date                 
2022-01-31  48.354839
2022-02-28  52.928571
2022-03-31  50.258065
...
2022-12-31  49.677419
```
- **Index**: The last day of each month (month-end).
- **Values**: The mean of the `data` values for that month.

---

### Key Concepts:

1. **Resampling**:
   - Aggregates time series data at a different frequency (e.g., daily to monthly).
   - Common aggregation functions: `.mean()`, `.sum()`, `.count()`, etc.

2. **Time Series Index**:
   - Setting the `date` column as the index allows pandas to perform time-based operations efficiently.

3. **Use Case**:
   - Useful for analyzing trends over longer periods by reducing granularity (e.g., daily data to monthly summaries).

Let me know if you'd like to dive deeper into any part!

In [77]:
date_rng=pd.date_range(start="2024-01-01",end="2024-12-31")#by default frwquency is day which is all dates in the year 
print(date_rng)

DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10',
               ...
               '2024-12-22', '2024-12-23', '2024-12-24', '2024-12-25',
               '2024-12-26', '2024-12-27', '2024-12-28', '2024-12-29',
               '2024-12-30', '2024-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')


In [79]:
date_rng=pd.date_range(start="2024-01-01",end="2024-12-31",freq='M')
print(date_rng)

DatetimeIndex(['2024-01-31', '2024-02-29', '2024-03-31', '2024-04-30',
               '2024-05-31', '2024-06-30', '2024-07-31', '2024-08-31',
               '2024-09-30', '2024-10-31', '2024-11-30', '2024-12-31'],
              dtype='datetime64[ns]', freq='ME')


  date_rng=pd.date_range(start="2024-01-01",end="2024-12-31",freq='M')


freq='M' generates dates at month-end frequency. This means it creates dates corresponding to the last day of each month.
If you specify start='2024-01-01', the first date generated will still be 2024-01-31 because it is the last day of the first month.

In [80]:
date_rng=pd.date_range(start="2024-01-01",end="2024-12-31",freq="D")#by default frwquency is day which is all dates in the year 
print(date_rng)

DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10',
               ...
               '2024-12-22', '2024-12-23', '2024-12-24', '2024-12-25',
               '2024-12-26', '2024-12-27', '2024-12-28', '2024-12-29',
               '2024-12-30', '2024-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')


In [None]:
df=pd.DataFrame(date_rng,columns=['Date'])
print(df)

          Date
0   2024-01-01
1   2024-01-02
2   2024-01-03
3   2024-01-04
4   2024-01-05
..         ...
361 2024-12-27
362 2024-12-28
363 2024-12-29
364 2024-12-30
365 2024-12-31

[366 rows x 1 columns]


In [84]:
df["Data"]=np.random.randint(1,101,size=366)
print(df)

          Date  Data
0   2024-01-01    46
1   2024-01-02    47
2   2024-01-03    47
3   2024-01-04    58
4   2024-01-05    18
..         ...   ...
361 2024-12-27    73
362 2024-12-28    24
363 2024-12-29     7
364 2024-12-30    38
365 2024-12-31    50

[366 rows x 2 columns]


In [85]:
df.set_index('Date',inplace=True)
print(df)

            Data
Date            
2024-01-01    46
2024-01-02    47
2024-01-03    47
2024-01-04    58
2024-01-05    18
...          ...
2024-12-27    73
2024-12-28    24
2024-12-29     7
2024-12-30    38
2024-12-31    50

[366 rows x 1 columns]


In [87]:
monthly=df.resample('ME')
print(monthly)

DatetimeIndexResampler [freq=<MonthEnd>, axis=0, closed=right, label=right, convention=start, origin=start_day]


In [None]:
monthly=df.resample('ME')
print(monthly)

The `resample` method in pandas is a powerful tool for performing time-based group operations. It is commonly used in time series data to change the frequency of data points and perform aggregation or transformation operations.

Here’s a detailed explanation of how `resample` works, with a focus on `monthly_mean = df.resample('M').mean()`:

---

### **What Does `resample('M')` Do?**
1. **Changes the Frequency**: 
   - The `resample` method groups data into new time intervals (called bins) based on the frequency specified.
   - `freq='M'` groups the data by the **end of each month** (month-end frequency).
   
2. **Aggregation**:
   - After grouping, the specified aggregation function (e.g., `.mean()`) is applied to the data within each time interval.

---

### **Key Components of `resample`**
1. **Frequency (`'M'`)**:
   - Determines how the data is grouped.
   - `'M'`: Groups data by month-end.
   - Common frequency strings:
     - `'D'`: Daily.
     - `'MS'`: Month-start.
     - `'W'`: Weekly.
     - `'Q'`: Quarterly.
     - `'Y'`: Yearly.
     - `'H'`: Hourly.
     - `'T'`: Minutes.
   
2. **Downsampling**:
   - Reducing the data frequency (e.g., from daily to monthly).
   - `df.resample('M')`: Converts a daily time series into a monthly time series.

3. **Upsampling**:
   - Increasing the data frequency (e.g., from monthly to daily).
   - When upsampling, you might need to fill missing values (`.ffill()`, `.bfill()`, etc.).

4. **Aggregation Functions**:
   - After resampling, you apply functions to aggregate data, such as:
     - `.mean()`: Compute the mean.
     - `.sum()`: Compute the sum.
     - `.count()`: Count non-NaN values.
     - `.max()` / `.min()`: Find max or min values.
     - Custom aggregation functions.

---

### **How It Works in `monthly_mean = df.resample('M').mean()`**
1. **Input DataFrame**:
   ```python
   df
   ```
   ```
               data
   date             
   2022-01-01    10
   2022-01-02    15
   2022-01-03    20
   ...
   2022-12-31    25
   ```

2. **Step 1: Group Data by Month-End**:
   - The data is divided into groups where each group represents one month.
   - For example:
     - January 2022: All rows from `2022-01-01` to `2022-01-31`.
     - February 2022: All rows from `2022-02-01` to `2022-02-28`.

3. **Step 2: Compute the Mean for Each Group**:
   - The `.mean()` method computes the average of the `data` column for each month.

4. **Output DataFrame**:
   ```python
   monthly_mean
   ```
   ```
                  data
   date               
   2022-01-31  12.333
   2022-02-28  14.500
   2022-03-31  16.250
   ...
   2022-12-31  22.375
   ```
   - The `date` index now represents the **last day of each month**.
   - The `data` column contains the mean values for each month's data.

---

### **Key Notes on `resample`**
1. **Requires a DatetimeIndex**:
   - The DataFrame must have a time-based index (e.g., `DatetimeIndex`).
   - If not, convert a column to the index using `set_index()`.

2. **Handles Missing Data**:
   - Missing dates in the time series are handled automatically during resampling.

3. **Custom Aggregations**:
   - Use `.agg()` to apply multiple or custom aggregation functions:
     ```python
     df.resample('M').agg(['mean', 'sum'])
     ```

4. **Flexible Alignment**:
   - Align by the start or end of time intervals using `MS` (month-start) or `M` (month-end).

---

### **Use Cases**
- **Time Series Analysis**:
  - Analyze trends, compute rolling averages, or aggregate metrics.
- **Data Reduction**:
  - Reduce high-frequency data (e.g., hourly) into lower-frequency summaries (e.g., daily or monthly).
- **Forecasting**:
  - Prepare data for forecasting models by aggregating to relevant frequencies.

Let me know if you want code examples for specific resampling scenarios!

Exactly! Here's a concise breakdown of what's happening in your case:

1. **Resampling (`resample('M')`)**:
   - Your original DataFrame has **366 rows** (daily data for a leap year like 2024).
   - The `resample('M')` groups these rows into **12 categories or groups**, where each group corresponds to a **month** (e.g., all rows for January, February, etc.).

2. **Aggregation Function (`mean()`)**:
   - Once the data is grouped by month, the **aggregate function (`mean()`)** is applied to each group.
   - This calculates the **average of the `data` column** for each month.

3. **Result**:
   - You get a new DataFrame with **12 rows**, where:
     - The **index** is the month-end date for each month.
     - The **value** is the mean of the `data` values for that month.

### Visualizing the Process

If the original data looks like this:
```
date        data
2024-01-01   10
2024-01-02   20
2024-01-03   30
...          ...
2024-01-31   50
```

For January 2024, `resample('M').mean()` calculates:
\[
\text{mean} = \frac{10 + 20 + 30 + \dots + 50}{31}
\]

And the output DataFrame would be:
```
date        data
2024-01-31   Mean value for Jan
2024-02-29   Mean value for Feb
...          ...
2024-12-31   Mean value for Dec
```

---

This is how `resample` simplifies time-series aggregation. Let me know if you'd like help visualizing this further!

### so basically it converts the 366 rows i have into 12 categories or groups and then applies the aggregate function on each such category

In [88]:
montly_mean=df.resample("M").mean()
print(montly_mean)

                 Data
Date                 
2024-01-31  44.967742
2024-02-29  55.896552
2024-03-31  49.354839
2024-04-30  57.700000
2024-05-31  60.096774
2024-06-30  55.000000
2024-07-31  51.935484
2024-08-31  53.258065
2024-09-30  47.966667
2024-10-31  37.580645
2024-11-30  48.800000
2024-12-31  50.354839


  montly_mean=df.resample("M").mean()


2. Create a Pandas DataFrame with a datetime index ranging from '2021-01-01' to '2021-12-31' and one column filled with random integers. Compute the rolling mean with a window of 7 days.

In [127]:
date_rng=pd.date_range(start="2021-01-01",end="2021-12-31",freq="D")
df=pd.DataFrame({"Date":date_rng,"Data":np.random.randint(1,101,size=len(date_rng))})
print(df)

          Date  Data
0   2021-01-01    54
1   2021-01-02    43
2   2021-01-03    16
3   2021-01-04    21
4   2021-01-05    39
..         ...   ...
360 2021-12-27    70
361 2021-12-28    37
362 2021-12-29    13
363 2021-12-30    48
364 2021-12-31    20

[365 rows x 2 columns]


In [99]:
df.set_index('Date',inplace=True)
print(df)


            Data
Date            
2021-01-01    54
2021-01-02    43
2021-01-03    16
2021-01-04    21
2021-01-05    39
...          ...
2021-12-27    70
2021-12-28    37
2021-12-29    13
2021-12-30    48
2021-12-31    20

[365 rows x 1 columns]


In [126]:
rolling_mean=df.rolling(window=7).mean()
print(rolling_mean)

                 Data
Date                 
2021-01-01        NaN
2021-01-02        NaN
2021-01-03        NaN
2021-01-04        NaN
2021-01-05        NaN
...               ...
2021-12-27  69.285714
2021-12-28  65.000000
2021-12-29  56.714286
2021-12-30  53.857143
2021-12-31  42.857143

[365 rows x 1 columns]


### Explanation of Rolling Mean and NaN Values:

#### **What is a Rolling Mean?**
A rolling mean (or moving average) is a statistical method used to smooth out short-term fluctuations and highlight longer-term trends in data. The method calculates the average of a window of values, which "rolls" through the data.

For example, with `window=7`, the rolling mean takes the mean of 7 consecutive values, starting from the first row, then moves forward by one row to repeat the process.

#### **Why are NaN Values Present in the Output?**
When using a rolling window of size 7:
1. The first 6 rows of the output will have `NaN` because there aren’t enough data points to fill the window.
2. Once the window reaches the 7th row, the rolling mean can be calculated.

#### Example:
Given the column `A`:
```
0  10
1   9
2  17
3   5
4  20
5  19
6   6
```
- The first 6 rows will be `NaN`.
- From row 6 onwards, the rolling mean will be calculated as the mean of rows [0,1,2,3,4,5,6].

#### Correct Approach to Dropping NaN Rows:
When working with rolling means, if you want to remove rows with `NaN`, you can use the `.dropna()` method:
```python
rolling_mean = df.rolling(window=7).mean()
rolling_mean.dropna(inplace=True)
print(rolling_mean)
```

If there are other issues in the dataset (like attempting to drop based on `True`/`False` indices), ensure the operation aligns with the actual structure of the DataFrame.

### Assignment 7: MultiIndex DataFrame

1. Create a Pandas DataFrame with a MultiIndex (hierarchical index). Perform some basic indexing and slicing operations on the MultiIndex DataFrame.

In [132]:
arrays=[['A','A','B','B'],['one','two','one','two']]
index=pd.MultiIndex.from_arrays(arrays,names=['Category','Subcategory'])
df=pd.DataFrame(np.random.randint(1,21,size=(4,3)),columns=['Value-1','Value-2','Value-3'],index=index)
print(df)

                      Value-1  Value-2  Value-3
Category Subcategory                           
A        one                1       12        2
         two                6        6       13
B        one               10       14        4
         two               15        7       16


In [137]:
print("Indexing at Category 'A':")
df.loc['A']

Indexing at Category 'A':


Unnamed: 0_level_0,Value-1,Value-2,Value-3
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,1,12,2
two,6,6,13


In [138]:
print("Slicing at Category 'A' and SubCategory 'two':")
df.loc['A','two']

Slicing at Category 'A' and SubCategory 'two':


Value-1     6
Value-2     6
Value-3    13
Name: (A, two), dtype: int64

In [136]:
df.loc['A','two']['Value-1']

np.int64(6)

2. Create a Pandas DataFrame with MultiIndex consisting of 'Category' and 'SubCategory'. Fill the DataFrame with random data and compute the sum of values for each 'Category' and 'SubCategory'.

In [7]:
import pandas as pd
import numpy as np

arrays=[['A','A','B','B'],['one','two','one','two']]
index=pd.MultiIndex.from_arrays(arrays,names=['Category','Subcategory'])
df=pd.DataFrame(np.random.randint(1,31,size=(4,3)),columns=['Value-1','Value-2','Value-3'],index=index)
print(df)

                      Value-1  Value-2  Value-3
Category Subcategory                           
A        one               19       19       21
         two               30        2       28
B        one               21       28        6
         two                4       15        3


In [10]:
category_sum=df.groupby('Category').sum()
print("The sum by category is: \n ",category_sum)

The sum by category is: 
            Value-1  Value-2  Value-3
Category                           
A              49       21       49
B              25       43        9


In [12]:
subcategory_sum=df.groupby('Subcategory').sum()
print("The sum by subcategory is: \n ",subcategory_sum)

The sum by subcategory is: 
               Value-1  Value-2  Value-3
Subcategory                           
one               40       47       27
two               34       17       31


In [14]:
subcategory_sum=df.groupby('Category')['Subcategory'].sum()
print("The sum by subcategory is: \n ",subcategory_sum)

KeyError: 'Column not found: Subcategory'

In [16]:
arrays=[['A','A','A','A','B','B'],['one','one','two','two','one','two']]
index=pd.MultiIndex.from_arrays(arrays,names=['Category','Subcategory'])
df=pd.DataFrame(np.random.randint(1,31,size=(6,3)),columns=['Value-1','Value-2','Value-3'],index=index)
print(df)

                      Value-1  Value-2  Value-3
Category Subcategory                           
A        one                1        7        8
         one                2       29       17
         two                2        5       13
         two               19       18       28
B        one               12       29       11
         two               18       20        2


In [18]:
subcategory_sum=df.groupby(['Category','Subcategory']).sum()#First group by category and then by subcategory so gives the sum of subcategory with the same name in each category
print("The sum by subcategory is: \n ",subcategory_sum)

The sum by subcategory is: 
                        Value-1  Value-2  Value-3
Category Subcategory                           
A        one                3       36       25
         two               21       23       41
B        one               12       29       11
         two               18       20        2


### Assignment 8: Pivot Tables

1. Create a Pandas DataFrame with columns 'Date', 'Category', and 'Value'. Create a pivot table to compute the sum of 'Value' for each 'Category' by 'Date'.

In [21]:
date_rng=pd.date_range(start='2025-01-01',end='2025-01-05',freq='D')
df=pd.DataFrame({'Date':date_rng,'Category':np.random.choice(list('ABC'),size=5),'Value':np.random.randint(1,101,size=5)})
print("Original Table\n",df)

Original Table
         Date Category  Value
0 2025-01-01        C     47
1 2025-01-02        B     26
2 2025-01-03        C     36
3 2025-01-04        A     94
4 2025-01-05        A     22


In [None]:
pivot_table=pd.pivot_table(values='Value', # The column whose values you want to aggregate.
                           index='Date',  # The rows of the pivot table, based on the unique values of 'Date'.
                           columns='Category', # The columns of the pivot table, based on the unique values of 'Category'.
                           aggfunc='sum', # The aggregation function to apply; here, it sums the 'Value' column.
                           data=df)
print("Pivot Table\n",pivot_table)

Pivot Table
 Category       A     B     C
Date                        
2025-01-01   NaN   NaN  47.0
2025-01-02   NaN  26.0   NaN
2025-01-03   NaN   NaN  36.0
2025-01-04  94.0   NaN   NaN
2025-01-05  22.0   NaN   NaN


The `pd.pivot_table` function in Pandas is a powerful tool for reshaping data into a tabular format. Here's an explanation of the pivot table part in your code:

### Code Breakdown:
```python
pivot_table = pd.pivot_table(
    values='Value',         # The column whose values you want to aggregate.
    index='Date',           # The rows of the pivot table, based on the unique values of 'Date'.
    columns='Category',     # The columns of the pivot table, based on the unique values of 'Category'.
    aggfunc='sum',          # The aggregation function to apply; here, it sums the 'Value' column.
    data=df                 # The source DataFrame.
)
```

### How It Works:
1. **`values='Value'`:** This specifies the column whose values will be aggregated and displayed in the table. In this case, the `Value` column is selected.
2. **`index='Date'`:** The unique values from the `Date` column form the rows of the pivot table.
3. **`columns='Category'`:** The unique values from the `Category` column form the columns of the pivot table.
4. **`aggfunc='sum'`:** The aggregation function is specified as `sum`, so the pivot table will sum up the values of the `Value` column for each combination of `Date` and `Category`.
5. **`data=df`:** This specifies the source DataFrame from which the data is drawn.

### Output:
#### Original Table (Example):
```
        Date Category  Value
0 2025-01-01        C     85
1 2025-01-02        A     23
2 2025-01-03        B     59
3 2025-01-04        C     42
4 2025-01-05        A     12
```

#### Pivot Table:
```
Category         A     B     C
Date                         
2025-01-01    NaN   NaN  85.0
2025-01-02   23.0   NaN   NaN
2025-01-03    NaN  59.0   NaN
2025-01-04    NaN   NaN  42.0
2025-01-05   12.0   NaN   NaN
```

### Explanation of Pivot Table Output:
- The **rows** are unique dates from the `Date` column.
- The **columns** are unique categories from the `Category` column.
- The **cell values** are the sum of `Value` for each combination of `Date` and `Category`. If there is no value for a specific combination, the cell contains `NaN` (not a number).

### Key Features:
- Missing combinations (e.g., no 'B' category for `2025-01-01`) result in `NaN` by default.
- The pivot table effectively summarizes data in a compact and easy-to-analyze format.

Let me know if you'd like a deeper dive or examples with more complex configurations!

In [27]:
import pandas as pd
import numpy as np

# Create sample data
data = {
    'Date': pd.date_range(start='2025-01-01', periods=8, freq='D').tolist() * 2,
    'Category': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'C', 'B', 'A', 'C', 'B', 'A'],
    'Value': np.random.randint(1, 101, size=16)
}

df = pd.DataFrame(data)

print("Original Table:")
print(df)

# Create a pivot table
pivot_table = pd.pivot_table(
    values='Value',
    index='Date',
    columns='Category',
    aggfunc='sum',
    data=df,
    fill_value=0  # Replace NaN with 0 for better readability
)




Original Table:
         Date Category  Value
0  2025-01-01        A      7
1  2025-01-02        B     38
2  2025-01-03        A     65
3  2025-01-04        C     82
4  2025-01-05        B     88
5  2025-01-06        C     22
6  2025-01-07        A     71
7  2025-01-08        B     64
8  2025-01-01        C     45
9  2025-01-02        A     84
10 2025-01-03        C     45
11 2025-01-04        B      7
12 2025-01-05        A     31
13 2025-01-06        C    100
14 2025-01-07        B     40
15 2025-01-08        A     79


In [28]:
print("\nPivot Table:")
print(pivot_table)


Pivot Table:
Category     A   B    C
Date                   
2025-01-01   7   0   45
2025-01-02  84  38    0
2025-01-03  65   0   45
2025-01-04   0   7   82
2025-01-05  31  88    0
2025-01-06   0   0  122
2025-01-07  71  40    0
2025-01-08  79  64    0


This line of code creates a list of dates that repeats twice. Here's a detailed explanation of how it works:

### Code Breakdown:
```python
'Date': pd.date_range(start='2025-01-01', periods=8, freq='D').tolist() * 2
```

1. **`pd.date_range(start='2025-01-01', periods=8, freq='D')`:**
   - **`start='2025-01-01'`:** Specifies the start date of the range.
   - **`periods=8`**: Specifies the number of dates to generate (8 dates in total).
   - **`freq='D'`**: Specifies the frequency as daily, so the dates will be consecutive days.
   - **Output:** A Pandas `DatetimeIndex` object containing 8 consecutive dates from `2025-01-01` to `2025-01-08`.

   Example Output:
   ```
   DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
                  '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08'],
                 dtype='datetime64[ns]', freq='D')
   ```

2. **`.tolist()`:**
   - Converts the `DatetimeIndex` object into a regular Python list.
   - **Output:** A list of dates.

   Example Output:
   ```
   ['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
    '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08']
   ```

3. **`* 2`:**
   - This duplicates the list by concatenating it with itself.
   - **Output:** A list containing two copies of the dates.

   Example Output:
   ```
   ['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
    '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08',
    '2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
    '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08']
   ```

### Why Use This?
The line is used to create a column of `Date` values in the DataFrame where each date repeats twice. This simulates scenarios with multiple data points (e.g., transactions, events) occurring on the same day, which is common in real-world datasets. It sets the stage for understanding how pivot tables handle repeated entries.

2. Create a Pandas DataFrame with columns 'Year', 'Quarter', and 'Revenue'. Create a pivot table to compute the mean 'Revenue' for each 'Quarter' by 'Year'.

In [30]:
df=pd.DataFrame({'Year':np.random.choice([2020,2021,2022,2023,2024],size=10),'Quarter':np.random.choice(['Q1','Q2','Q3','Q4'],size=10),'Revenue':np.random.randint(1,21,size=10)},index=list(i for i in range(1,11)))
print("Original Table\n",df)

Original Table
     Year Quarter  Revenue
1   2021      Q1        7
2   2024      Q2        3
3   2024      Q2       10
4   2020      Q2        7
5   2020      Q4        6
6   2020      Q4       17
7   2024      Q3        7
8   2023      Q4       12
9   2024      Q4        8
10  2024      Q3        8


In [33]:
pivot_table=pd.pivot_table(data=df,
                           index='Year',
                           columns='Quarter',
                           values='Revenue',
                           aggfunc='sum',
                           fill_value=0
            
                           )

print("Pivot Table\n",pivot_table)

Pivot Table
 Quarter  Q1  Q2  Q3  Q4
Year                   
2020      0   7   0  23
2021      7   0   0   0
2023      0   0   0  12
2024      0  13  15   8


### Assignment 9: Applying Functions

1. Create a Pandas DataFrame with 3 columns and 5 rows filled with random integers. Apply a function that doubles the values of the DataFrame.

In [35]:
df=pd.DataFrame(np.random.randint(1,101,size=(5,3)),index=list('ABCDE'),columns=['Value-1','Value-2','Value-3'])
print("Original Table\n",df)

Original Table
    Value-1  Value-2  Value-3
A       31       43       41
B       65       19       95
C       49       56       56
D       88       29        8
E       63       94       26


In [36]:
df_doubled=df.applymap(lambda x:x*2)#applymap :Apply a function to a Dataframe elementwise.
print("Doubled Table\n",df_doubled)

Doubled Table
    Value-1  Value-2  Value-3
A       62       86       82
B      130       38      190
C       98      112      112
D      176       58       16
E      126      188       52


  df_doubled=df.applymap(lambda x:x*2)#applymap :Apply a function to a Dataframe elementwise.


In [37]:
df_doubled=df.map(lambda x:x*2)#map:Apply a function to a Dataframe elementwise.
print("Doubled Table\n",df_doubled)

Doubled Table
    Value-1  Value-2  Value-3
A       62       86       82
B      130       38      190
C       98      112      112
D      176       58       16
E      126      188       52


2. Create a Pandas DataFrame with 3 columns and 6 rows filled with random integers. Apply a lambda function to create a new column that is the sum of the existing columns.

In [45]:
df=pd.DataFrame(np.random.randint(1,101,size=(6,3)),index=list('ABCDEF'),columns=['Value-1','Value-2','Value-3'])
print("Original Table\n",df)

Original Table
    Value-1  Value-2  Value-3
A       26        2       65
B       79        6       83
C       96       61       60
D       73       97       83
E       73       41       84
F       53       46       83


In [46]:
df['Sum of Rows ']=df.apply(lambda x:x.sum(),axis=1)#apply:applies function to each row or column
print("New Column Table\n",df)

New Column Table
    Value-1  Value-2  Value-3  Sum of Rows 
A       26        2       65            93
B       79        6       83           168
C       96       61       60           217
D       73       97       83           253
E       73       41       84           198
F       53       46       83           182


In [47]:
df['Sum of Columns ']=df.apply(lambda x:x.sum(),axis=0)
print("New Row Table\n",df)


New Row Table
    Value-1  Value-2  Value-3  Sum of Rows   Sum of Columns 
A       26        2       65            93              NaN
B       79        6       83           168              NaN
C       96       61       60           217              NaN
D       73       97       83           253              NaN
E       73       41       84           198              NaN
F       53       46       83           182              NaN


In [48]:
df.dtypes

Value-1              int64
Value-2              int64
Value-3              int64
Sum of Rows          int64
Sum of Columns     float64
dtype: object

### Assignment 10: Working with Text Data

1. Create a Pandas Series with 5 random text strings. Convert all the strings to uppercase.

In [49]:
text_data = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
print("Original Series:\n",text_data)

Original Series:
 0         apple
1        banana
2        cherry
3          date
4    elderberry
dtype: object


In [51]:
upper_data=text_data.str.upper()
print("Upper Case Series:\n",upper_data)

Upper Case Series:
 0         APPLE
1        BANANA
2        CHERRY
3          DATE
4    ELDERBERRY
dtype: object


Yes, you're correct in thinking that `text_data` is a Pandas DataFrame and that you can use the dot operator to call functions or access attributes. However, the `str` part in `text_data.str.upper()` is a special **accessor** in Pandas, which is used specifically for **vectorized string operations**. Let's break it down step-by-step:

### **What is `str` in Pandas?**
- The `.str` accessor in Pandas is not a standard attribute or method of DataFrames. Instead, it's a **string methods accessor** that allows you to apply string functions to each element in a Series (or column of a DataFrame), which is why you're able to call string operations like `upper()`.

- **Pandas `str` Accessor:** It allows you to perform operations like:
  - `text_data.str.upper()` to convert text to uppercase.
  - `text_data.str.lower()` to convert text to lowercase.
  - `text_data.str.contains('pattern')` to check if a pattern exists in the text.

This works **on the individual string values** within the DataFrame, rather than operating directly on the entire DataFrame as a whole.

### **How Does It Work?**
When you use `.str`:
- It **converts the Series** (or column) of string-like objects into an object that has methods specifically for string manipulations.
- The methods available in `.str` work on **each element** of the Series (i.e., on the strings inside the DataFrame).

### **Calling `upper()` with `.str`**
Here’s how it works in this case:
```python
upper_data = text_data.str.upper()
```

1. **`text_data`** is a DataFrame (or Series) containing string-like data.
2. **`.str`** is the accessor that Pandas provides to apply string functions to the individual elements of the column/series.
3. **`upper()`** is a string method that converts the strings to uppercase.

So, `upper()` is applied **element-wise** on all string values in the column (or series) `text_data`, and it returns a new Series with those strings converted to uppercase.

### **Why Not Just One Dot Operator?**
You could think of `.str` as a way to access a **special group of functions** that are designed for string manipulation. If you didn't have `.str`, you would not be able to use string functions like `upper()`, `lower()`, `contains()`, etc., on a Pandas Series or DataFrame.

### **Example**

Let's say `text_data` is a DataFrame like this:
```python
import pandas as pd

# Sample DataFrame
text_data = pd.DataFrame({
    'Name': ['alice', 'bob', 'charlie', 'diana']
})

# Convert the 'Name' column to uppercase
upper_data = text_data['Name'].str.upper()
print(upper_data)
```

**Output:**
```
0      ALICE
1        BOB
2    CHARLIE
3      DIANA
Name: Name, dtype: object
```

Here, `.str.upper()` is applied to the **'Name' column**, and each name is converted to uppercase.

### **Summary**
- **`.str`** is an accessor for string operations in Pandas.
- It allows you to perform vectorized string operations on Series (or columns) of string-like objects.
- Methods like `upper()`, `lower()`, `replace()`, etc., are available when you use the `.str` accessor. 
- Without `.str`, you cannot directly apply string methods on Series (or DataFrame) objects.

2. Create a Pandas Series with 5 random text strings. Extract the first three characters of each string.

In [52]:
df=pd.Series(['Messi','Ronaldo','Neymar','Pele','Harry Kane'])
print("Original Series:\n",df)

df_three_characters=df.str[0:3]
print("First three characters:\n",df_three_characters)

Original Series:
 0         Messi
1       Ronaldo
2        Neymar
3          Pele
4    Harry Kane
dtype: object
First three characters:
 0    Mes
1    Ron
2    Ney
3    Pel
4    Har
dtype: object
