Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [2]:
import pandas as pd

my_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
series = pd.Series(my_list)
print(series)

0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:
     Name    Age    Gender
     Alice   25     Female
     Bob     30     Male
     Claire  27     Female

In [3]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In [None]:
In Pandas, a DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is one of the core data structures provided by Pandas and is designed to hold and manipulate structured data.

Here are the key characteristics of a Pandas DataFrame:

1. Tabular Structure: A DataFrame represents data in a tabular form, where data is organized into rows and columns, similar to a spreadsheet or a SQL table.

2. Heterogeneous Data: Each column in a DataFrame can hold data of different data types (e.g., integers, floats, strings). This allows you to work with mixed data types within the same DataFrame.

3. Labeled Axes: Both rows and columns in a DataFrame have labels or index values, making it easy to access and manipulate specific data points.

4. Size Mutable: You can add, remove, or modify rows and columns in a DataFrame.

Now, let's compare a DataFrame with a Pandas Series using an example:

python

import pandas as pd

# Creating a Pandas Series
ser = pd.Series([10, 20, 30, 40, 50], name='Numbers')

# Creating a Pandas DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}
df = pd.DataFrame(data)

print("Pandas Series:")
print(ser)
print("\nPandas DataFrame:")
print(df)

In this example:

The ser variable represents a Pandas Series. It's a one-dimensional labeled array, and in this case, it contains a list of numbers.

The df variable represents a Pandas DataFrame. It's a two-dimensional structure with labeled columns ('Name', 'Age', 'Gender') and rows (0, 1, 2). It holds more structured data, including not just numbers but also strings (names and genders), making it suitable for tabular data like a dataset of people's information.

In summary, the key difference between a Pandas Series and a DataFrame is that a Series is one-dimensional and typically used for a single column of data, whereas a DataFrame is two-dimensional, tabular, and designed to handle multiple columns of data, making it suitable for working with structured datasets like spreadsheets or database tables

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

In [None]:
Pandas provides a wide range of functions and methods for manipulating data in a DataFrame. Here are some common functions and methods you can use to manipulate data in a Pandas DataFrame along with examples of when you might use them:

1.Selecting Data:
df[column]: Select a single column.
df[[col1, col2]]: Select multiple columns.
df.loc[row_label]: Select rows by label.
df.iloc[row_index]: Select rows by integer index.
Example: Selecting specific columns from a DataFrame.

python

selected_columns = df[['Name', 'Age']]

2. Filtering Data:
df[df['column'] > value]: Filter rows based on a condition.
Example: Filtering rows where Age is greater than 25.

python

filtered_data = df[df['Age'] > 25]

3. Sorting Data:
df.sort_values('column'): Sort the DataFrame by a specific column.
df.sort_values(['col1', 'col2']): Sort by multiple columns.
Example: Sorting a DataFrame by Age in ascending order.

python

sorted_data = df.sort_values('Age')

4. Grouping and Aggregating Data:
df.groupby('column').agg({'other_column': 'function'}): Group data by a column and perform aggregation functions.
Example: Calculate the average age for each gender.

python

avg_age_by_gender = df.groupby('Gender').agg({'Age': 'mean'})

5. Adding and Removing Columns:
df['new_column'] = values: Add a new column.
df.drop('column', axis=1): Remove a column.
Example: Adding a new column for calculating years until retirement.

python

df['Years_until_retirement'] = 65 - df['Age']

6. Handling Missing Data:
df.dropna(): Remove rows with missing values.
df.fillna(value): Fill missing values with a specified value.
Example: Removing rows with missing values.

python

df_cleaned = df.dropna()

7. Merging and Joining DataFrames:
pd.concat([df1, df2]): Concatenate DataFrames vertically or horizontally.
pd.merge(df1, df2, on='key'): Merge DataFrames based on a common column.
Example: Merging two DataFrames based on a common key column.

python

merged_data = pd.merge(df1, df2, on='ID')

These are just a few of the many functions and methods available in Pandas for data manipulation. Depending on your specific data analysis needs, you can use these functions to clean, transform, analyze, and visualize your data effectively in a Pandas DataFrame.

Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In [None]:
In Pandas, among the options you mentioned (Series, DataFrame, Panel), only the DataFrame is mutable in nature.

A DataFrame is a two-dimensional data structure that can be modified after its creation. You can add, remove, or modify columns and rows in a DataFrame. This mutability is a key feature of DataFrames, making them suitable for various data manipulation tasks.

Both Series and Panel are relatively less mutable:

Series is generally considered immutable. Once you create a Series, you cannot change its values or index directly. You would typically need to create a new Series if you want to modify the data.

Panel, which used to be a data structure in earlier versions of Pandas, is now deprecated in favor of using MultiIndex DataFrames. Panels were less commonly used and are no longer actively maintained.

So, if you need a mutable data structure in Pandas for working with tabular data, you should use a DataFrame.

Q7. Create a DataFrame using multiple Series. Explain with an example.

In [None]:
You can create a Pandas DataFrame using multiple Series by combining these Series into a dictionary, where each Series becomes a column in the DataFrame. Here's an example:

python

import pandas as pd

# Creating multiple Series
names = pd.Series(['Alice', 'Bob', 'Claire'])
ages = pd.Series([25, 30, 27])
genders = pd.Series(['Female', 'Male', 'Female'])

# Combining the Series into a DataFrame
data = {
    'Name': names,
    'Age': ages,
    'Gender': genders
}

df = pd.DataFrame(data)

# Printing the DataFrame
print(df)

In this example:

We create three Pandas Series: names, ages, and genders, each containing data for a specific column.

We create a dictionary data, where the keys are the column names ('Name', 'Age', 'Gender') and the values are the corresponding Series.

We use the pd.DataFrame constructor to combine the Series into a DataFrame named df. Each Series becomes a column in the DataFrame.

Finally, we print the resulting DataFrame df, which will look like this:

markdown

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female

So, you can create a DataFrame by using multiple Series, and this is a common approach when you have different sets of data that you want to combine into a single tabular structure for analysis.