In [None]:
# Q1. List any five functions of the pandas library with execution.

Here are five commonly used functions in the Pandas library along with their execution examples:

1. `read_csv()`: This function is used to read data from a CSV file and create a DataFrame.


import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Display the DataFrame
print(df)


2. `head()`: This function returns the first few rows of a DataFrame. By default, it returns the first five rows.


import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Display the first 3 rows
print(df.head(3))


3. `info()`: This function provides a summary of the DataFrame, including the data types, number of non-null values, and memory usage.


import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Display the summary information
df.info()

4. `describe()`: This function generates descriptive statistics of the DataFrame, including count, mean, standard deviation, 
minimum, maximum, and quartile values.


import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Generate descriptive statistics
description = df.describe()
print(description)


5. `groupby()`: This function is used to group data based on one or more columns and perform operations on those groups. 
It's commonly used for data aggregation and summarization.


import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Group the DataFrame by the 'category' column and calculate the average of 'price' for each group
grouped = df.groupby('category')['price'].mean()
print(grouped)


These functions demonstrate some of the fundamental operations in Pandas, such as reading data, exploring the structure 
of the DataFrame, generating summary statistics, and performing group-based operations.

In [None]:
# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
# DataFrame with a new index that starts from 1 and increments by 2 for each row.



using the `reset_index()` function in Pandas to re-index the DataFrame with a new index that starts from 1 and increments 
by 2 for each row. Here's a Python function that performs this re-indexing:


import pandas as pd

def reindex_dataframe(df):
    new_index = pd.RangeIndex(start=1, stop=len(df) * 2, step=2)
    df = df.reset_index(drop=True)
    df.index = new_index
    return df

# Example usage
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Display the re-indexed DataFrame
print(df_reindexed)



In the `reindex_dataframe()` function, we first create a new index using the `RangeIndex` function. The `start` parameter 
is set to 1, the `stop` parameter is set to `len(df) * 2` to ensure the index increments by 2 for each row, and the `step` 
parameter is set to 2. 

Next, we use the `reset_index()` function to reset the existing index and drop it from the DataFrame. This ensures that the 
DataFrame has a default numerical index starting from 0.

Finally, we assign the new index to the DataFrame using the `index` attribute.

The resulting DataFrame, `df_reindexed`, has a new index starting from 1 and incrementing by 2 for each row.

In [5]:
# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
# iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
# function should print the sum to the console.

# For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
# calculate and print the sum of the first three values, which is 60.
!pip install pandas
import pandas as pd

def calculate_sum_first_three_values(df):
    sum_values = 0
    count = 0

    for value in df['Values']:
        sum_values += value
        count += 1

        if count == 3:
            break

    print("Sum of the first three values:", sum_values)

# Example usage
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Calculate the sum of the first three values
calculate_sum_first_three_values(df)






Collecting pandas
  Downloading pandas-2.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[K     |████████████████████████████████| 12.3 MB 27.1 MB/s eta 0:00:01
[?25hCollecting pytz>=2020.1
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
[K     |████████████████████████████████| 502 kB 63.1 MB/s eta 0:00:01
[?25hCollecting tzdata>=2022.1
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
[K     |████████████████████████████████| 341 kB 64.4 MB/s eta 0:00:01
Collecting numpy>=1.20.3; python_version < "3.10"
  Downloading numpy-1.24.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
[K     |████████████████████████████████| 17.3 MB 65.9 MB/s eta 0:00:01
Installing collected packages: pytz, tzdata, numpy, pandas
Successfully installed numpy-1.24.3 pandas-2.0.2 pytz-2023.3 tzdata-2023.3
Sum of the first three values: 60


In [6]:
# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
# 'Word_Count' that contains the number of words in each row of the 'Text' column.



def add_word_count_column(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Example usage
df = pd.DataFrame({'Text': ['Hello, how are you?', 'I am doing well.', 'Python programming is fun.', 'Hey buddy this is rajnish']})

# Add the 'Word_Count' column
df = add_word_count_column(df)

# Display the updated DataFrame
print(df)


                         Text  Word_Count
0         Hello, how are you?           4
1            I am doing well.           4
2  Python programming is fun.           4
3   Hey buddy this is rajnish           5


In [None]:
# Q5. How are DataFrame.size() and DataFrame.shape() different?