                                            Pandas Assignment-2

Q1)

Certainly! Pandas is a popular Python library for data manipulation and analysis. Here are five common functions provided by pandas along with their execution examples:

1. **`pandas.DataFrame()`**: This function is used to create a DataFrame, which is a 2-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print(df)
```

2. **`df.head(n)`**: This function is used to display the first `n` rows of a DataFrame. It is helpful for quickly inspecting the data.

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print(df.head(2))
```

3. **`df.describe()`**: This function generates descriptive statistics of the DataFrame, providing information like count, mean, standard deviation, minimum, and maximum values for each numeric column.

```python
import pandas as pd

data = {'Age': [25, 30, 35, 28, 32, 27]}
df = pd.DataFrame(data)
print(df.describe())
```

4. **`df.groupby()`**: This function is used to group data based on one or more columns. It's often used in combination with aggregation functions like `sum()`, `mean()`, or `count()`.

```python
import pandas as pd

data = {'City': ['New York', 'Los Angeles', 'New York', 'Chicago'],
        'Population': [8.4, 3.9, 8.4, 2.7]}

df = pd.DataFrame(data)
grouped = df.groupby('City')['Population'].sum()
print(grouped)
```

5. **`df.plot()`**: This function allows you to create basic plots and visualizations directly from a DataFrame. It's often used in combination with libraries like Matplotlib for data visualization.

```python
import pandas as pd
import matplotlib.pyplot as plt

data = {'Year': [2010, 2015, 2020, 2025],
        'GDP (in trillions)': [14.6, 17.4, 21.3, 26.0]}

df = pd.DataFrame(data)
df.plot(x='Year', y='GDP (in trillions)', kind='line')
plt.xlabel('Year')
plt.ylabel('GDP (in trillions)')
plt.title('GDP Growth Over Time')
plt.show()
```

These are just a few of the many functions that pandas provides for data manipulation and analysis. Pandas is a powerful tool for data handling and is widely used in data science and data analysis tasks.

Q2)

In [1]:
import pandas as pd

def reindex_with_increment(df):
    # Create a new index using a range starting from 1 and incrementing by 2
    new_index = pd.RangeIndex(start=1, stop=1 + 2 * len(df), step=2)
    
    # Set the new index to the DataFrame
    df = df.set_index(new_index)
    
    return df

# Example usage:
data = {'A': [10, 20, 30, 40],
        'B': [50, 60, 70, 80],
        'C': [90, 100, 110, 120]}

df = pd.DataFrame(data)

new_df = reindex_with_increment(df)
print(new_df)


    A   B    C
1  10  50   90
3  20  60  100
5  30  70  110
7  40  80  120


Q3)

In [2]:
import pandas as pd

def sum_first_three_values(df):
    total = 0
    for i, row in df.iterrows():
        if i < 3:  # Only consider the first three rows
            total += row['Values']
    
    print("Sum of the first three values:", total)

# Example usage:
data = {'Values': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

sum_first_three_values(df)


Sum of the first three values: 60


Q4)

In [3]:
import pandas as pd

def count_words(text):
    # Split the text by spaces to count words
    words = text.split()
    return len(words)

def add_word_count_column(df):
    # Create a new column 'Word_Count' by applying the count_words function to each row
    df['Word_Count'] = df['Text'].apply(count_words)

# Example usage:
data = {'Text': ["This is a sample sentence.", "Another example.", "Pandas is great."]}
df = pd.DataFrame(data)

add_word_count_column(df)
print(df)


                         Text  Word_Count
0  This is a sample sentence.           5
1            Another example.           2
2            Pandas is great.           3


Q5)

DataFrame.size:

DataFrame.size returns the total number of elements (cells) in the DataFrame, which is the product of the number of rows and the number of columns.


DataFrame.shape:

DataFrame.shape returns a tuple that contains the number of rows and the number of columns in the DataFrame.






Q6)

To read an Excel file in Pandas, you can use the pandas.read_excel() function. This function is part of the Pandas library and is used to read data from Excel files (both .xls and .xlsx formats) and create a DataFrame. 

Q7)

In [3]:
import pandas as pd

def extract_username(df):
    # Split the 'Email' column at the '@' symbol and select the first part
    df['Username'] = df['Email'].str.split('@').str[0]

# Example usage:
data = {'Email': ['user1@example.com', 'user2@example.org', 'user3@example.net']}
df = pd.DataFrame(data)

extract_username(df)
print(df)


               Email Username
0  user1@example.com    user1
1  user2@example.org    user2
2  user3@example.net    user3


Q8)

In [4]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example usage:
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}

df = pd.DataFrame(data)

selected_df = select_rows(df)
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9)

In [5]:
import pandas as pd

def calculate_statistics(df):
    # Calculate mean, median, and standard deviation
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()
    
    return mean_value, median_value, std_deviation

# Example usage:
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

mean, median, std = calculate_statistics(df)
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std}")


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10)

In [1]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example usage:
# Assuming df is your DataFrame with 'Sales' and 'Date' columns
# Make sure your 'Date' column is in datetime format
# df['Date'] = pd.to_datetime(df['Date'])
# Call the function to calculate the moving average
# df = calculate_moving_average(df)


Q11)

In [2]:
import pandas as pd

def calculate_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

# Example usage:
# Assuming df is your DataFrame with 'Sales' and 'Date' columns
# Make sure your 'Date' column is in datetime format
# df['Date'] = pd.to_datetime(df['Date'])
# Call the function to calculate the moving average
# df = calculate_moving_average(df)


Q12)

In [3]:
import pandas as pd

def select_rows_in_date_range(df):
    # Convert the date column to datetime if it's not already
    if not pd.api.types.is_datetime64_ns_dtype(df['Date']):
        df['Date'] = pd.to_datetime(df['Date'])

    # Define the start and end dates for the range
    start_date = pd.Timestamp('2023-01-01')
    end_date = pd.Timestamp('2023-01-31')

    # Use boolean indexing to select rows within the date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

    return selected_rows

# Example usage:
# Assuming df is your DataFrame with 'Date' column
# Make sure your 'Date' column is in datetime format
# df['Date'] = pd.to_datetime(df['Date'])
# Call the function to select rows within the date range
# selected_df = select_rows_in_date_range(df)


Q13)

To use the basic functions of Pandas, the first and foremost library you need to import is, of course, Pandas itself. 