# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [2]:
import pandas as pd

ser = pd.Series([4,8,15,16,23], index=[0, 1, 2, 3, 4])

print(ser)

0     4
1     8
2    15
3    16
4    23
dtype: int64


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [3]:
my_list = [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

series1 = pd.Series(my_list)

print(series1)


0     3
1     6
2     9
3    12
4    15
5    18
6    21
7    24
8    27
9    30
dtype: int64


# Q3. Create a Pandas DataFrame that contains the following data:

| Name | Age | Gender|
|----------|----------|----------|
| Alice | 25 | Female |
| Bob | 30 | male   |
| Claire | 27 | Female |

# Then, print the DataFrame.

In [6]:
df = pd.read_csv("data_test.csv")

print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    male
2  Claire   27  Female


In [8]:
df.head(3)

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,male
2,Claire,27,Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

DataFrame is a 2-dimensional labeled data structure that consists of rows and columns, much like a table in a relational database or an Excel spreadsheet. It's one of the primary data structures provided by the Pandas library. DataFrames allow for storing and manipulating data in a tabular form and provide various functionalities to perform data analysis.

Pandas Series is a one-dimensional labeled array that can hold data of any type like integer, string, float, etc. It is a single column of a DataFrame and can be seen as the building block of a DataFrame. A DataFrame can be thought of as a collection of Series that share the same index.

In [2]:
import pandas as pd

# Pandas
s = pd.Series([10, 20, 30, 40, 50])
print(type(s))
print(s)


<class 'pandas.core.series.Series'>
0    10
1    20
2    30
3    40
4    50
dtype: int64


In [4]:
#DataFrame
data = {
    'A': pd.Series([10, 20, 30, 40, 50]),
    'B': pd.Series(['a', 'b', 'c', 'd', 'e'])
}

df = pd.DataFrame(data)
print(type(df))
print(df)


<class 'pandas.core.frame.DataFrame'>
    A  B
0  10  a
1  20  b
2  30  c
3  40  d
4  50  e


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate and analyze data within a DataFrame. Some common functions used for data manipulation in Pandas include:

1. **head() and tail()**: These functions display the first few (head) or last few (tail) rows of the DataFrame. They are helpful to quickly examine the structure and contents of the DataFrame.

2. **describe()**: It provides summary statistics of numerical columns in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values.

3. **info()**: This function gives a concise summary of the DataFrame, including the data types, column names, and non-null values in each column.

4. **drop()**: It is used to drop specified rows or columns from the DataFrame.

5. **groupby()**: This function is used to group data based on one or more columns and perform aggregate functions on those groups.

6. **apply()**: It applies a function along an axis of the DataFrame.

7. **fillna() and dropna()**: These functions are used for handling missing values by filling them with a specified value or dropping rows/columns with missing data.

8. **sort_values()**: It sorts the DataFrame by one or more columns.

9. **merge() and concat()**: These functions are used for combining DataFrames.

For example, the `groupby()` function could be used to group sales data by region and find the average sales amount in each region. This would allow analysis and comparison of sales performance across different regions.

In [None]:
df.head()  # Display the first few rows of the DataFrame
df.tail(7)  # Display the last 7 rows of the DataFrame

df.describe()  # Summary statistics of numerical columns

df.info()  # Information about the DataFrame

df.drop(columns=['Column1', 'Column2'])  # Drop specified columns
df.drop(index=[0, 4, 7])  # Drop specified rows

grouped = df.groupby('Category')  # Grouping data by 'Category' column
grouped['Value'].mean()  # Calculate the mean of 'Value' column within each group

df['New_Column'] = df['Existing_Column'].apply(lambda x: x * 2)  # Apply a function to create a new column based on an existing one

df.sort_values(by='Column_Name', ascending=False)  # Sort DataFrame by 'Column_Name' in descending order

new_df = pd.concat([df, df])  # Concatenating two DataFrames
merged_df = pd.merge(df, df, on='Key_Column')  # Merging two DataFrames on a common column

# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In the Pandas, both Series and DataFrame are mutable, meaning they can be modified after creation. However, the Panel data structure, is deprecated as of version 0.25.0 and has been removed in later versions. Panels were 3D data structures but were removed due to lack of consistent use and in favor of handling multi-dimensional data using other data structures like MultiIndex DataFrames or simply 3D arrays/numpy arrays.

As of recent Pandas versions, you can focus on using Series and DataFrame, both of which are mutable and allow modifications to their data, index, and columns after creation.

# Q7. Create a DataFrame using multiple Series. Explain with an example.

Certainly! You can create a DataFrame by using multiple Pandas Series. Here's an example:

In this example, two Pandas Series, `s1` and `s2`, are created. `s1` contains integers, and `s2` contains strings. These Series are then used to create a DataFrame, `df`. The resulting DataFrame has two columns: 'Column_A' and 'Column_B', derived from `s1` and `s2` respectively.

The `pd.DataFrame()` function is used to create the DataFrame, and the data is passed in the form of a dictionary where the keys are the column names, and the values are the Series you want to include in the DataFrame.

This way, you can combine multiple Series into a single DataFrame, which is a more structured and organized way to work with related data in a tabular form.

In [10]:
import pandas as pd

s1 = pd.Series([10, 20, 30, 40, 50], name='A')
s2 = pd.Series(['a', 'b', 'c', 'd', 'e'], name='B')

print(s1)
print("\n")

data = {'Column_A': s1, 'Column_B': s2}
df = pd.DataFrame(data)

print(df)

0    10
1    20
2    30
3    40
4    50
Name: A, dtype: int64


   Column_A Column_B
0        10        a
1        20        b
2        30        c
3        40        d
4        50        e
