# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [2]:
import pandas as pd

# Create a Pandas Series with the given data
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Print the series
print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [3]:
# Create a list containing 10 elements
my_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# Apply the pandas.Series function on the list
series_from_list = pd.Series(my_list)

# Print the series
series_from_list


Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50
5,60
6,70
7,80
8,90
9,100


# Q3. Create a Pandas DataFrame that contains the following data:

In [4]:
import pandas as pd

# Create a dictionary with the data
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

# Create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)

# Print the DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

A DataFrame in Pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It can be thought of as a spreadsheet or a SQL table. A DataFrame consists of rows and columns, where each column can contain different types of data (integers, floats, strings, etc.).

# Key Differences Between DataFrame and Series:
# Dimension:

#DataFrame: Two-dimensional (rows and columns).
#Series: One-dimensional (a single column of data).
#Structure:

DataFrame: Can store multiple Series objects as its columns, where each Series represents a single column of the DataFrame.
Series: Represents a single column of data with an index.
Usage:

DataFrame: Used for more complex data representations, typically for datasets with multiple features (columns).
Series: Used for simpler datasets or when you only need to handle one-dimensional data.
Example:
Here’s a simple example to illustrate the differences:

python

In [5]:
import pandas as pd

# Create a Pandas Series
age_series = pd.Series([25, 30, 27], index=['Alice', 'Bob', 'Claire'])
print("Series:")
print(age_series)

# Create a Pandas DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}
df = pd.DataFrame(data)
print("\nDataFrame:")
print(df)


Series:
Alice     25
Bob       30
Claire    27
dtype: int64

DataFrame:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide array of functions to manipulate data in a DataFrame. Here are some common functions, along with examples of when you might use them:

Common Functions for Data Manipulation:
head(): Returns the first few rows of the DataFrame.

Use Case: Quickly preview the data to understand its structure.
python
Copy code
df.head()
tail(): Returns the last few rows of the DataFrame.

Use Case: Check the end of the dataset after modifications.
python
Copy code
df.tail()
info(): Provides a summary of the DataFrame, including the index dtype and columns, non-null values, and memory usage.

Use Case: Understand the structure and data types in the DataFrame.
python
Copy code
df.info()
describe(): Generates descriptive statistics for numerical columns (count, mean, std, min, 25%, 50%, 75%, max).

Use Case: Get a quick statistical summary of your data.
python
Copy code
df.describe()
drop(): Removes specified rows or columns from the DataFrame.

Use Case: Remove unnecessary columns from the DataFrame.
python
Copy code
df.drop('Gender', axis=1, inplace=True)
filter(): Subsets the DataFrame based on specified conditions.

Use Case: Select specific rows based on conditions.
python
Copy code
df[df['Age'] > 25]
groupby(): Groups the data based on certain criteria and allows for aggregation.

Use Case: Calculate the average age by gender.
python
Copy code
df.groupby('Gender')['Age'].mean()
merge(): Combines two DataFrames based on a common column.

Use Case: Join additional information to an existing DataFrame.
python
Copy code
pd.merge(df1, df2, on='common_column')
sort_values(): Sorts the DataFrame by one or more columns.

Use Case: Sort data by age.
python
Copy code
df.sort_values(by='Age')
fillna(): Fills missing values in the DataFrame.

Use Case: Replace NaN values with the mean or median.
python
Copy code
df.fillna(df.mean(), inplace=True)

# Example of Using a Function:
Suppose you have a DataFrame containing employee data, and you want to get an overview of the numerical features:

In [6]:
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Salary': [50000, 60000, 55000]
}
df = pd.DataFrame(data)

# Use describe() to get a statistical summary
stats_summary = df.describe()
print(stats_summary)


             Age   Salary
count   3.000000      3.0
mean   27.333333  55000.0
std     2.516611   5000.0
min    25.000000  50000.0
25%    26.000000  52500.0
50%    27.000000  55000.0
75%    28.500000  57500.0
max    30.000000  60000.0


# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In Pandas, the following are mutable in nature:

# Series: Mutable

A Pandas Series can be modified after it has been created. You can change individual values, add new values, or delete values.

# DataFrame: Mutable

A DataFrame is also mutable. You can add or remove columns, change individual cell values, and modify the overall structure of the DataFrame.

# Panel: Mutable (but deprecated)

A Panel was mutable as well, but it has been deprecated since version 0.25.0 of Pandas. It is recommended to use multi-dimensional data structures like xarray or DataFrame with hierarchical indexing instead of Panel.

# Q7. Create a DataFrame using multiple Series. Explain with an example.

In [7]:
import pandas as pd

# Create individual Series
name_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])

# Create a DataFrame using the Series
data = {
    'Name': name_series,
    'Age': age_series,
    'Gender': gender_series
}

df = pd.DataFrame(data)

# Print the DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
