# Answer 1

1) read_csv() - used to read data from CSV files and create a pandas DataFrame object.

In [2]:
import pandas as pd

# Reading a CSV file and storing it in a DataFrame object
df = pd.read_csv('sample.csv')

# Displaying the first 5 rows of the DataFrame
print(df.head())

                        Industry
0             Accounting/Finance
1   Advertising/Public Relations
2             Aerospace/Aviation
3  Arts/Entertainment/Publishing
4                     Automotive


2) info() - used to get a concise summary of a DataFrame, including the data type of each column, and the number of non-null values.

In [3]:
import pandas as pd

# Creating a DataFrame object from a dictionary
data = {'name': ['John', 'Alice', 'Bob'],
        'age': [25, 30, 20],
        'city': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

# Displaying information about the DataFrame
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   age     3 non-null      int64 
 2   city    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
None


3) describe() - used to generate descriptive statistics about a DataFrame, such as count, mean, standard deviation, and quartiles.

In [4]:
import pandas as pd

# Creating a DataFrame object from a dictionary
data = {'age': [25, 30, 20, 35, 40, 22]}
df = pd.DataFrame(data)

# Displaying descriptive statistics about the DataFrame
print(df.describe())

             age
count   6.000000
mean   28.666667
std     7.788881
min    20.000000
25%    22.750000
50%    27.500000
75%    33.750000
max    40.000000


4) drop() - used to drop specified labels from rows or columns of a DataFrame.

In [6]:
import pandas as pd

# Creating a DataFrame object from a dictionary
data = {'name': ['John', 'Alice', 'Bob'],
        'age': [25, 30, 20],
        'city': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

# Dropping the 'city' column
df = df.drop('city', axis=1)

# Displaying the modified DataFrame
print(df)

    name  age
0   John   25
1  Alice   30
2    Bob   20


5) groupby() - used to group data in a DataFrame based on one or more columns and perform operations on the grouped data.

In [7]:
import pandas as pd

# Creating a DataFrame object from a dictionary
data = {'name': ['John', 'Alice', 'Bob', 'John', 'Bob'],
        'age': [25, 30, 20, 35, 40],
        'city': ['New York', 'London', 'Paris', 'New York', 'Paris']}
df = pd.DataFrame(data)

# Grouping the data by 'name' and calculating the mean age for each group
grouped = df.groupby('name')['age'].mean()

# Displaying the grouped data
print(grouped)

name
Alice    30
Bob      30
John     30
Name: age, dtype: int64


# Answer 2

Here's a Python function that re-indexes a Pandas DataFrame with a new index that starts from 1 and increments by 2 for each row:

In [8]:
import pandas as pd

def reindex_df(df):
    new_index = pd.RangeIndex(start=1, step=2, stop=len(df)*2+1)
    df.index = new_index
    return df

Here's how you can use the function with a sample DataFrame:

In [9]:
# Creating a sample DataFrame
df = pd.DataFrame({'A': [10, 20, 30], 'B': [100, 200, 300], 'C': [1000, 2000, 3000]})

# Re-indexing the DataFrame with the new index
df = reindex_df(df)

# Displaying the modified DataFrame
print(df)

    A    B     C
1  10  100  1000
3  20  200  2000
5  30  300  3000


In the reindex_df() function, we first create a new index using pd.RangeIndex() with start as 1, step as 2, and stop as the length of the DataFrame multiplied by 2 plus 1. This creates a RangeIndex with odd-numbered integers starting from 1.

# Answer 3

In [10]:
import pandas as pd

def sum_first_three(df):
    sum = 0
    for i in range(3):
        sum += df['Values'][i]
    print("Sum of the first three values:", sum)

    # Create sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function
sum_first_three(df)


Sum of the first three values: 60


# Answer 4

In [11]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split(" ")))
    return df

# Create sample DataFrame
data = {'Text': ['This is a sample text', 'Another text with more words', 'Short text']}
df = pd.DataFrame(data)

# Call the function
df = add_word_count(df)

# Display the updated DataFrame
print(df)


                           Text  Word_Count
0         This is a sample text           5
1  Another text with more words           5
2                    Short text           2


# Answer 5

DataFrame.size() returns the total number of elements in a DataFrame, which is equal to the product of the number of rows and the number of columns. This method returns a single integer value.

DataFrame.shape() returns a tuple of integers representing the number of rows and columns in a DataFrame, respectively. The first element of the tuple is the number of rows, and the second element is the number of columns.

In [12]:
import pandas as pd

# Create sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Print the size of the DataFrame
print(df.size)

# Print the shape of the DataFrame
print(df.shape)

6
(3, 2)


# Answer 6

To read an Excel file in Pandas, we use the read_excel() function. This function is part of the Pandas library and can be used to read Excel files in various formats, including .xls and .xlsx.

Here's an example of how to use the read_excel() function to read an Excel file:

In [13]:
import pandas as pd

# Read the Excel file
df = pd.read_excel('excel.xlsx')

In [14]:
df

Unnamed: 0,Order ID,Order Date,Order Quantity,Sales,Ship Mode,Profit,Unit Price,Customer Name,Customer Segment,Product Category
0,3,2010-10-13,6,261.5400,Regular Air,-213.250,38.94,Muhammed MacIntyre,Small Business,Office Supplies
1,6,2012-02-20,2,6.9300,Regular Air,-4.640,2.08,Ruben Dartt,Corporate,Office Supplies
2,32,2011-07-15,26,2808.0800,Regular Air,1054.820,107.53,Liz Pelletier,Corporate,Furniture
3,32,2011-07-15,24,1761.4000,Delivery Truck,-1748.560,70.89,Liz Pelletier,Corporate,Furniture
4,32,2011-07-15,23,160.2335,Regular Air,-85.129,7.99,Liz Pelletier,Corporate,Technology
...,...,...,...,...,...,...,...,...,...,...
1002,7171,2011-02-13,17,303.1865,Regular Air,92.592,20.99,Andy Gerbode,Consumer,Technology
1003,7174,2012-03-10,10,141.9200,Regular Air,12.200,13.73,Thomas Thornton,Consumer,Furniture
1004,7175,2010-02-07,10,748.2500,Delivery Truck,-86.990,70.98,Helen Andreada,Corporate,Furniture
1005,7203,2009-01-08,25,21752.0100,Regular Air,9296.348,896.99,Ruben Dartt,Corporate,Office Supplies


# Answer 7

In [15]:
#function to return names in data
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

In [16]:
# Create sample DataFrame
data = {'Email': ['ahmad.bader@example.com', 'Sudhanshu.Kumar@example.com', 'Krish.Naik@example.com']}
df = pd.DataFrame(data)

# Call the function
df = extract_username(df)

# Display the updated DataFrame
print(df)

                         Email         Username
0      ahmad.bader@example.com      ahmad.bader
1  Sudhanshu.Kumar@example.com  Sudhanshu.Kumar
2       Krish.Naik@example.com       Krish.Naik


# Answer 8

In [17]:
#function to satisfy the condition in the data
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

In [18]:
# Create sample DataFrame
data = {'A': [1, 6, 2, 7, 3, 8], 'B': [10, 5, 12, 4, 8, 2], 'C': ['X', 'Y', 'Z', 'W', 'P', 'Q']}
df = pd.DataFrame(data)

# Call the function
selected_df = select_rows(df)

# Display the selected rows
print(selected_df)

   A  B  C
1  6  5  Y
3  7  4  W
5  8  2  Q


# Answer 9

In [19]:
#function to calculate the necessary things
import pandas as pd

def calculate_statistics(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std_dev = df['Values'].std()
    return mean, median, std_dev

In [21]:
# Create sample DataFrame
data = {'Values': [10,20,30,40,50,50,50,60,60,65,66,67,88,88,89,90]}
df = pd.DataFrame(data)

# Call the function
mean, median, std_dev = calculate_statistics(df)

# Display the calculated statistics
print("Mean: ", mean)
print("Median: ", median)
print("Standard Deviation: ", std_dev)


Mean:  57.6875
Median:  60.0
Standard Deviation:  24.540357916433628


# Answer 10

In [26]:
#a Python function that will create a new column 'MovingAverage' in a given Pandas DataFrame 'df' with columns 'Sales' and 'Date' that contains the moving average of the sales for the past 7 days for each row in the DataFrame

import pandas as pd

def calculate_moving_average(df):
    window_size = 7
    sales_values = df['Sales'].values
    moving_averages = pd.Series(sales_values).rolling(window_size, min_periods=1).mean()
    df['MovingAverage'] = moving_averages
    return df

In [27]:
# Create sample DataFrame to use the function
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08', '2022-01-09', '2022-01-10'],
        'Sales': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]}
df = pd.DataFrame(data)

# Call the function
df_with_ma = calculate_moving_average(df)

# Display the DataFrame with the MovingAverage column
print(df_with_ma)

         Date  Sales  MovingAverage
0  2022-01-01    100          100.0
1  2022-01-02    200          150.0
2  2022-01-03    300          200.0
3  2022-01-04    400          250.0
4  2022-01-05    500          300.0
5  2022-01-06    600          350.0
6  2022-01-07    700          400.0
7  2022-01-08    800          500.0
8  2022-01-09    900          600.0
9  2022-01-10   1000          700.0


# Answer 11

In [28]:
#function to satisfy the condition
def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')
    return df

In [30]:
# a sample example of using the above function

# create example dataframe
df = pd.DataFrame({
    'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07'],
    'Sales': [100, 200, 300, 400, 500, 600, 700]
})

# convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# add weekday column
def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')
    return df

df = add_weekday_column(df)
print(df.head())

        Date  Sales    Weekday
0 2022-01-01    100   Saturday
1 2022-01-02    200     Sunday
2 2022-01-03    300     Monday
3 2022-01-04    400    Tuesday
4 2022-01-05    500  Wednesday


# Answer 12

In [None]:
import pandas as pd

# create example dataframe
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', end='2023-02-28'),
    'Sales': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800]
})

# select rows between '2023-01-01' and '2023-01-31'
start_date = '2023-01-01'
end_date = '2023-01-31'
mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
filtered_df = df.loc[mask]

print(filtered_df.head())

# Answer 13

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is pandas itself. The pandas library provides a powerful set of tools for working with structured data, including functions for creating, manipulating, and analyzing data in tabular form using the DataFrame and Series objects.