#Q1
Sure, here are five functions from the pandas library along with their executions:

1. **`read_csv`**: This function is used to read data from a CSV file into a pandas DataFrame.

```python
import pandas as pd

# Read CSV file into a DataFrame
data = pd.read_csv('data.csv')

# Display the first few rows of the DataFrame
print(data.head())
```

2. **`info`**: This function provides a concise summary of a DataFrame, including information about the data types and non-null values in each column.

```python
# Display summary information about the DataFrame
data.info()
```

3. **`groupby`**: This function is used to group data in a DataFrame based on one or more columns, and then perform aggregate functions on those groups.

```python
# Group data by 'category' and calculate the mean of 'value' column for each group
grouped_data = data.groupby('category')['value'].mean()

# Display the result
print(grouped_data)
```

4. **`fillna`**: This function is used to fill missing values in a DataFrame with specified values.

```python
# Fill missing values in the 'age' column with the mean age
data['age'].fillna(data['age'].mean(), inplace=True)

# Display the DataFrame with filled values
print(data.head())
```

5. **`pivot_table`**: This function creates a pivot table from a DataFrame, allowing you to summarize and analyze data in a tabular format.

```python
# Create a pivot table that shows the mean 'value' for each 'category' and 'month'
pivot_table = data.pivot_table(values='value', index='category', columns='month', aggfunc='mean')

# Display the pivot table
print(pivot_table)
```

Remember that the above code examples assume you have appropriate data files or DataFrames named 'data.csv' and 'data', and you've imported the pandas library using `import pandas as pd`. Make sure to adjust the code according to your specific use case and data.


In [2]:
import pandas as pd

In [3]:
#Q2
def reindex_with_increment(df):
    new_index = pd.Index(range(1, len(df)*2, 2))
    df_reindexed = df.copy()
    df_reindexed.index = new_index
    return df_reindexed

# Example DataFrame with columns 'A', 'B', and 'C'
data = {'A': [10, 20, 30],
        'B': [15, 25, 35],
        'C': [100, 200, 300]}

df = pd.DataFrame(data)

# Call the function to re-index the DataFrame
reindexed_df = reindex_with_increment(df)

# Display the re-indexed DataFrame
print(reindexed_df)


    A   B    C
1  10  15  100
3  20  25  200
5  30  35  300


In [14]:
#Q3
def calculate_sum_of_first_three(df):
    # Get the first three values in the 'Values' column
    first_three_values = df['Values'][:3]

    # Calculate the sum of the first three values
    sum_first_three = first_three_values.sum()

    # Print the sum to the console
    print("Sum of the first three values:", sum_first_three)

# Example DataFrame with a 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}

df = pd.DataFrame(data)

# Call the function to calculate and print the sum
calculate_sum_of_first_three(df)


Sum of the first three values: 60


In [15]:
#Q4
def add_word_count_column(df):
    # Split each text into words and count the number of words
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df

# Example DataFrame with a 'Text' column
data = {'Text': ["This is a sample text.",
                 "Another example with more words.",
                 "Short text."]}

df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
df_with_word_count = add_word_count_column(df)

# Display the DataFrame with the new 'Word_Count' column
print(df_with_word_count)


                               Text  Word_Count
0            This is a sample text.           5
1  Another example with more words.           5
2                       Short text.           2


#Q5
In pandas, `DataFrame.size` and `DataFrame.shape` are both attributes of a DataFrame, but they provide different information about the structure of the DataFrame:

1. **`DataFrame.size`**:
   - This attribute returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns.
   - It gives you the total count of cells in the DataFrame, including cells with missing or NaN values.
   - The value returned by `DataFrame.size` is an integer.

Example:
```python
import pandas as pd

data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}

df = pd.DataFrame(data)

print(df.size)  # Output: 6 (2 rows * 3 columns)
```

2. **`DataFrame.shape`**:
   - This attribute returns a tuple representing the dimensions of the DataFrame.
   - The tuple contains two elements: the number of rows and the number of columns.
   - It provides a concise way to understand the structure of the DataFrame.

Example:
```python
import pandas as pd

data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}

df = pd.DataFrame(data)

print(df.shape)  # Output: (3, 2) (3 rows, 2 columns)
```

In summary, `DataFrame.size` gives you the total number of elements in the DataFrame, while `DataFrame.shape` gives you the dimensions of the DataFrame as a tuple (number of rows, number of columns).

#Q6
# Read Excel file into a DataFrame
df = pd.read_excel('file.xlsx')

# Display the DataFrame
print(df)


In [16]:
#Q7
def extract_username(df):
    # Extract usernames from email addresses using string splitting
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

# Example DataFrame with an 'Email' column
data = {'Email': ['john.doe@example.com',
                  'jane.smith@example.com',
                  'alice.wonderland@example.com']}

df = pd.DataFrame(data)

# Call the function to extract usernames and create the 'Username' column
df_with_username = extract_username(df)

# Display the DataFrame with the new 'Username' column
print(df_with_username)


                          Email          Username
0          john.doe@example.com          john.doe
1        jane.smith@example.com        jane.smith
2  alice.wonderland@example.com  alice.wonderland


In [10]:
#Q8
def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example DataFrame with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}

df = pd.DataFrame(data)

# Call the function to select rows based on the conditions
selected_df = select_rows(df)

# Display the selected DataFrame
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [17]:
#Q9
def calculate_stats(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_dev = df['Values'].std()
    return mean_value, median_value, std_dev

# Example DataFrame with a 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}

df = pd.DataFrame(data)

# Call the function to calculate statistics
mean, median, std_dev = calculate_stats(df)

# Display the calculated statistics
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std_dev)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


In [11]:
#Q10
def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example DataFrame with 'Sales' and 'Date' columns
data = {'Sales': [50, 45, 60, 70, 55, 80, 75, 90, 65, 70],
        'Date': pd.date_range('2023-08-01', periods=10)}

df = pd.DataFrame(data)

# Call the function to calculate the moving average
df_with_moving_avg = calculate_moving_average(df)

# Display the DataFrame with the new 'MovingAverage' column
print(df_with_moving_avg)


   Sales       Date  MovingAverage
0     50 2023-08-01      50.000000
1     45 2023-08-02      47.500000
2     60 2023-08-03      51.666667
3     70 2023-08-04      56.250000
4     55 2023-08-05      56.000000
5     80 2023-08-06      60.000000
6     75 2023-08-07      62.142857
7     90 2023-08-08      67.857143
8     65 2023-08-09      70.714286
9     70 2023-08-10      72.142857


In [13]:
#Q11
def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')
    return df

# Example DataFrame with a 'Date' column
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}

df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df_with_weekday = add_weekday_column(df)

# Display the DataFrame with the new 'Weekday' column
print(df_with_weekday)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


In [18]:
#Q12
import pandas as pd

def select_rows_between_dates(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    return selected_rows

# Example DataFrame with a 'Date' column
data = {'Date': pd.to_datetime(['2023-01-15', '2023-01-25', '2023-02-05', '2023-01-10', '2023-02-01'])}

df = pd.DataFrame(data)

# Call the function to select rows between the specified dates
selected_df = select_rows_between_dates(df)

# Display the selected DataFrame
print(selected_df)


        Date
0 2023-01-15
1 2023-01-25
3 2023-01-10


#Q13
To use the basic functions of pandas, you need to import the `pandas` library itself. Here's how you can import it:

```python
import pandas as pd
```

By convention, the `pandas` library is usually imported with the alias `pd`, which makes it easier to refer to pandas functions and classes using the `pd` prefix. Once you've imported the `pandas` library, you can use its functions and classes to work with DataFrames, Series, and various data manipulation and analysis tasks.