<a href="https://colab.research.google.com/github/afzalasar7/Data-Science/blob/main/Week%208%20Pandas/Pandas_Assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Q1. List any five functions of the pandas library with execution.

1. **`pd.DataFrame()`**: Used to create a DataFrame from various sources like lists, dictionaries, or NumPy arrays.

```python
import pandas as pd

# Create a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 22, 28]}
df = pd.DataFrame(data)

print(df)
```

2. **`df.head()`**: Returns the first few rows (by default, 5 rows) of the DataFrame.

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 22, 28]}
df = pd.DataFrame(data)

print(df.head())
```

3. **`df.info()`**: Provides a concise summary of the DataFrame, including the data types and non-null values in each column.

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 22, 28]}
df = pd.DataFrame(data)

print(df.info())
```

4. **`df.groupby()`**: Used for grouping the DataFrame by one or more columns and applying aggregate functions.

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob'],
        'Score': [85, 92, 78, 90, 88]}
df = pd.DataFrame(data)

grouped_df = df.groupby('Name').mean()
print(grouped_df)
```

5. **`df.drop()`**: Used to remove rows or columns from the DataFrame.

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 22, 28]}
df = pd.DataFrame(data)

# Drop the 'Name' column
df = df.drop('Name', axis=1)

print(df)
```

## Q2. Python function to re-index a DataFrame with a new index starting from 1, incrementing by 2 for each row.

```python
import pandas as pd

def reindex_dataframe(df):
    df.index = range(1, len(df) * 2, 2)
    return df

# Sample DataFrame
data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}
df = pd.DataFrame(data)

# Call the function to reindex the DataFrame
reindexed_df = reindex_dataframe(df)

print(reindexed_df)
```

# Q3. Python function to calculate the sum of the first three values in the 'Values' column.

```python
import pandas as pd

def sum_of_first_three(df):
    first_three_sum = df['Values'][:3].sum()
    print("Sum of the first three values:", first_three_sum)

# Sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate the sum
sum_of_first_three(df)
```

# Q4. Python function to create a new column 'Word_Count' containing the number of words in each row of the 'Text' column.

```python
import pandas as pd

def count_words(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df

# Sample DataFrame
data = {'Text': ['This is a sample text.',
                 'Pandas is a powerful library.',
                 'Data analysis is fun!']}
df = pd.DataFrame(data)

# Call the function to add 'Word_Count' column
df = count_words(df)

print(df)
```

# Q5. Difference between DataFrame.size() and DataFrame.shape()

- `DataFrame.size`: This function returns the total number of elements in the DataFrame, which is equal to the number of rows multiplied by the number of columns.

```python
import pandas as pd

data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

print(df.size)  # Output: 6
```

- `DataFrame.shape`: This function returns a tuple representing the dimensions of the DataFrame, i.e., the number of rows and the number of columns.

```python
import pandas as pd

data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

print(df.shape)  # Output: (3, 2)
```

# Q6. Function to read an excel file using pandas.

```python
import pandas as pd

def read_excel_file(filename):
    df = pd.read_excel(filename)
    return df

# Call the function to read the Excel file
dataframe = read_excel_file('data.xlsx')
print(dataframe)
```

## Q7. Python function to extract the username from the 'Email' column and create a new 'Username' column.

```python
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

# Sample DataFrame
data = {'Email': ['john.doe@example.com',
                  'alice.smith@example.com',
                  'bob_johnson@example.com']}
df = pd.DataFrame(data)

# Call the function to extract the username
df = extract_username(df)

print(df)
```

## Q8. Python function to select rows where column 'A' > 5 and column 'B' < 10.

```python
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Sample DataFrame
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function to select rows
selected_df = select_rows(df)

print(selected_df)
```

# Q9. Python function to calculate the mean, median, and standard deviation of the 'Values' column.

```python
import pandas as pd

def calculate_stats(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_value = df['Values'].std()
    return mean_value, median_value, std_value

# Sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate statistics
mean_val, median_val, std_val = calculate_stats(df)

print("Mean:", mean_val)
print("Median:", median_val)
print("Standard Deviation:", std_val)
```

# Q10. Python function to calculate the moving average of sales for the past 7 days.

```python
import pandas as pd

def moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Sample DataFrame
data = {'Date': pd.date_range('2023-01-01', periods=10, freq='D'),
        'Sales': [50, 45, 60, 55, 70, 80, 75, 90, 85, 100]}
df = pd.DataFrame(data)

# Call the function to calculate the moving average
df = moving_average(df)

print(df)
```

# Q11. Python function to create a new 'Weekday' column in the DataFrame.

```python
import pandas as pd

def add_weekday(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Sample DataFrame
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df = add_weekday(df)

print(df)
```

# Q12. Python function to select rows where the date is between '2023-01-01' and '2023-01-31'.

```python
import pandas as pd

def select_rows_between_dates(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    selected_rows = df.loc[mask]
    return selected_rows

# Sample DataFrame
data = {'Date': pd.date_range('2022-12-20', periods=40, freq='D')}
df = pd.DataFrame(data)

# Call the function to select rows between dates
selected_df = select_rows_between_dates(df)

print(selected_df)
```

# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is the pandas library itself. You can import it using the following convention:

```python
import pandas as pd
```