## 1

Five common functions of the pandas library along with their execution are:

1.pd.read_csv(): This function is used to read data from a CSV (Comma Separated Values) file and create a DataFrame. 

   import pandas as pd
   
   df = pd.read_csv('example.csv')
   

2.df.head(): This function is used to display the first few rows of a DataFrame. It helps in quickly inspecting the data.
   df.head()
   

3.df.info(): This function provides a summary of the DataFrame, including data types, non-null counts, and memory usage.
   df.info()
   

4.df.groupby(): This function is used for grouping and aggregating data based on one or more columns.

   grouped_data = df.groupby('Category')['Price'].mean()
   

5.df.describe(): This function generates various summary statistics of numeric columns in the DataFrame, such as mean, standard deviation, and quartiles.
   df.describe()

In [14]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David','Mona','Chandler','Joey','Monica','Rachel'],
        'Age': [25, 30, 35, 40,25,40,50,40,30],
        'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago','New York', 'San Francisco', 'New York', 'Chicago','New York']}

df = pd.DataFrame(data)

print(df)


       Name  Age           City
0     Alice   25       New York
1       Bob   30  San Francisco
2   Charlie   35    Los Angeles
3     David   40        Chicago
4      Mona   25       New York
5  Chandler   40  San Francisco
6      Joey   50       New York
7    Monica   40        Chicago
8    Rachel   30       New York


In [15]:
df.head()

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles
3,David,40,Chicago
4,Mona,25,New York


In [16]:
df.describe()

Unnamed: 0,Age
count,9.0
mean,35.0
std,8.291562
min,25.0
25%,30.0
50%,35.0
75%,40.0
max,50.0


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    9 non-null      object
 1   Age     9 non-null      int64 
 2   City    9 non-null      object
dtypes: int64(1), object(2)
memory usage: 348.0+ bytes


In [18]:
df.groupby('City').count()

Unnamed: 0_level_0,Name,Age
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicago,2,2
Los Angeles,1,1
New York,4,4
San Francisco,2,2


In [19]:
df.dtypes

Name    object
Age      int64
City    object
dtype: object

## 2

We can re-index a Pandas DataFrame with a new index that starts from 1 and increments by 2 for each row using the `reindex` method along with a custom index array. Here's a Python function to achieve this:

In [27]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index array starting from 1 and incrementing by 2
    new_index = range(1, len(df) * 2, 2)
    
    df = df.reindex(new_index)

    return df

# Example DataFrame
data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}

df = pd.DataFrame(data)
print("Before Reindexing:\n",df)

# Re-index the DataFrame
df = reindex_dataframe(df)

# Display the re-indexed DataFrame
print("After Reindexing:\n",df)


Before Reindexing:
     A   B   C
0  10  40  70
1  20  50  80
2  30  60  90
After Reindexing:
       A     B     C
1  20.0  50.0  80.0
3   NaN   NaN   NaN
5   NaN   NaN   NaN


In this example:
The reindex_dataframe function takes a DataFrame df as input.
It generates a new index array starting from 1 and incrementing by 2 using the range function.
Then, it reindexes the DataFrame df with the new index using the reindex method.
Finally, it returns the re-indexed DataFrame.

## 3

In [28]:
import pandas as pd

def calculate_sum_of_first_three_values(df):

    # Extract the 'Values' column and calculate the sum of the first three values
    values_column = df['Values']
    first_three_values_sum = values_column.head(3).sum()

    # Print the sum to the console
    print(f"The sum of the first three values in the 'Values' column is: {first_three_values_sum}")

data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate and print the sum of the first three values
calculate_sum_of_first_three_values(df)


The sum of the first three values in the 'Values' column is: 60


The calculate_sum_of_first_three_values function takes a DataFrame df as input.
It  extracts the 'Values' column using df['Values'] and calculates the sum of the first three values using .head(3).sum().
Finally, it prints the calculated sum to the console.

## 4

In [30]:
import pandas as pd

def add_word_count_column(df):

    # Count the number of words in each row and create a new 'Word_Count' column
    df['Word_Count'] = df['Text'].str.split().apply(len)


data = {'Text': ['This is a sample sentence.', 'Test case', 'Words count !']}

df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
add_word_count_column(df)

# Display the DataFrame with the new 'Word_Count' column
print(df)


                         Text  Word_Count
0  This is a sample sentence.           5
1                   Test case           2
2               Words count !           3


The add_word_count_column function takes a DataFrame df as input.
It uses .str.split().apply(len) to count the number of words in each row of the 'Text' column and creates a new 'Word_Count' column.
Finally, it displays the DataFrame with the new 'Word_Count' column.

## 5

In pandas, `DataFrame.size` and `DataFrame.shape` are attributes that provide information about the dimensions of a DataFrame, but they serve slightly different purposes:

1. DataFrame.size:

   - `DataFrame.size` returns the total number of elements in the DataFrame, which is equivalent to the product of the number of rows and the number of columns.
   - It returns a single integer representing the total size of the DataFrame.


2. DataFrame.shape:

   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame. The tuple contains two values: the number of rows and the number of columns.
   - It provides a more detailed view of the DataFrame's structure by separating the row and column counts.

In [34]:
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
   
size = df.size  # Returns 6 (3 rows x 2 columns = 6 elements)
print("Size:",size)
shape = df.shape
print("Shape:",shape)

Size: 6
Shape: (3, 2)


## 6

To read an Excel file in pandas, we can use the `pandas.read_excel()` function. This function is specifically designed to read data from Excel files. Here's how we can use it:


import pandas as pd


df = pd.read_excel('your_excel_file.xlsx')

df

## 7

In [7]:
import pandas as pd

data = {'Email': ['monalisha@gmail.com', 'xyz@example.com', 'abc@example.com']}
df = pd.DataFrame(data)


def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

df = extract_username(df)

print(df)


                 Email   Username
0  monalisha@gmail.com  monalisha
1      xyz@example.com        xyz
2      abc@example.com        abc


## 8

In [17]:
import pandas as pd

data1={ 'A':[3,8,6,2,9,8,9]
      ,'B':[5,2,9,3,1,2,1],
      'C':[1,7,4,5,2,7,2]}

df=pd.DataFrame(data1)

def new_dataframe(df):
    df1=df[(df['A'] > 5) & (df['B'] < 10)]
    return df1

df1=new_dataframe(df).reset_index(drop=True)
print(df1)

   A  B  C
0  8  2  7
1  6  9  4
2  9  1  2
3  8  2  7
4  9  1  2


## 9

In [26]:
import pandas as pd

data={'Values':[10,20,30,30,10]}
df=pd.DataFrame(data)

def calculate(df):
    mean=df['Values'].mean()
    median=df['Values'].median()
    standard_deviation=df['Values'].std()
    return mean,median,standard_deviation
mean, median, std=calculate(df)
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std}")

Mean: 20.0
Median: 20.0
Standard Deviation: 10.0


## 10

In [32]:
import pandas as pd


data = {'Date': pd.date_range(start='2023-09-01', periods=10, freq='D'),
        'Sales': [100, 120, 130, 140, 150, 160, 170, 180, 190, 200]}

df = pd.DataFrame(data)

def calculate_moving_average(df, window_size=7):
    df = df.sort_values(by='Date')
    df['MovingAverage'] = df['Sales'].rolling(window=window_size, min_periods=1).mean()  
    #min_periods=1 to include the current day even if there are fewer than 7 days of data.
    
    return df

df = calculate_moving_average(df)


print(df)


        Date  Sales  MovingAverage
0 2023-09-01    100     100.000000
1 2023-09-02    120     110.000000
2 2023-09-03    130     116.666667
3 2023-09-04    140     122.500000
4 2023-09-05    150     128.000000
5 2023-09-06    160     133.333333
6 2023-09-07    170     138.571429
7 2023-09-08    180     150.000000
8 2023-09-09    190     160.000000
9 2023-09-10    200     170.000000


## 11

In [33]:
import pandas as pd


data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}
df = pd.DataFrame(data)

def add_weekday_column(df):
    
    df['Date'] = pd.to_datetime(df['Date'])
    df['Weekday'] = df['Date'].dt.strftime('%A')
    
    return df

df = add_weekday_column(df)


print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


## 12

In [35]:
import pandas as pd

data = {'Date': ['2023-01-03','2023-03-03','2023-04-08','2023-01-04','2023-01-09','2023-01-01','2023-01-07','2023-01-18','2023-01-13','2023-01-11',]}

df = pd.DataFrame(data)

def select_rows_between_dates(df, start_date, end_date):

    df['Date'] = pd.to_datetime(df['Date'])
    
    
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows


start_date = pd.to_datetime('2023-01-01')
end_date = pd.to_datetime('2023-01-31')
selected_df = select_rows_between_dates(df, start_date, end_date)

print(selected_df)


        Date
0 2023-01-03
3 2023-01-04
4 2023-01-09
5 2023-01-01
6 2023-01-07
7 2023-01-18
8 2023-01-13
9 2023-01-11


## 13

To use the basic functions of pandas in Python, we need to import the pandas library. 

import pandas as pd