In [1]:
# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]

pd.Series(data)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [2]:
# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [4]:
import pandas

list_data = [12, 52.4, 'Data Science', True, 1+3j, 'Henil', False, 98.99, 12+45j, 108]

pandas.Series(list_data)

0              12
1            52.4
2    Data Science
3            True
4          (1+3j)
5           Henil
6           False
7           98.99
8        (12+45j)
9             108
dtype: object

##### Q3. Create a Pandas DataFrame that contains the following data:

| Name | Age | Gender |
| :- | -: | :-: |
| Alice | 25 | Female |
| Bob | 30 | Male |
| Claire | 27 | Female |


##### Then, print the DataFrame.

In [10]:
import pandas as pd

df = {'Name':['Alice', 'Bob', 'Claire'], 'Age':[25,30, 27], 'Gender':['Female', 'Male', 'Female']}

pd.DataFrame(df)

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


In [11]:
# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a **DataFrame** is a two-dimensional tabular data structure that consists of rows and columns. It is one of the core data structures provided by the pandas library, and it's similar to a spreadsheet or a SQL table. Each column in a DataFrame can contain different types of data, such as numbers, strings, or even more complex objects, and each row represents a unique observation or record.

On the other hand, a **Series** is a one-dimensional labeled array in pandas. It can be thought of as a single column of data with labels (index) attached to each element. A Series can hold any type of data, similar to a NumPy array, but it also has an index that allows for quick and label-based access to the data.

Here's an example to illustrate the difference between a DataFrame and a Series:

In [12]:
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

# Creating a Pandas Series
ages = pd.Series([25, 30, 22, 28], name='Age')

print("DataFrame:")
print(df)
print("\nSeries:")
print(ages)

DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3    David   28      Houston

Series:
0    25
1    30
2    22
3    28
Name: Age, dtype: int64


In [13]:
# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data in a DataFrame. Here are some common functions along with examples of when you might use them:

**1. head() and tail():** These functions are used to view the first few or last few rows of a DataFrame, respectively.

In [22]:
import pandas as pd

data = {'Name': ['HENIL', 'HANSRAJ', 'ABHI', 'YASH'],
        'Age': [24, 23, 25, 22]}
df = pd.DataFrame(data)

print(df.head(2))

print(df.tail(1))

      Name  Age
0    HENIL   24
1  HANSRAJ   23
   Name  Age
3  YASH   22


**2. shape:** This attribute returns the dimensions (rows, columns) of the DataFrame.

In [23]:
print(df.shape)

(4, 2)


**3. describe():** Generates summary statistics of the numeric columns in the DataFrame.

In [24]:
print(df.describe())

             Age
count   4.000000
mean   23.500000
std     1.290994
min    22.000000
25%    22.750000
50%    23.500000
75%    24.250000
max    25.000000


**4. sort_values():** Sorts the DataFrame by one or more columns.

In [25]:
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)

      Name  Age
2     ABHI   25
0    HENIL   24
1  HANSRAJ   23
3     YASH   22


**5. groupby():** Groups data based on a column and allows you to perform aggregate operations on those groups.

In [26]:
grouped = df.groupby('Age')
average_age = grouped['Age'].mean()
print(average_age)

Age
22    22.0
23    23.0
24    24.0
25    25.0
Name: Age, dtype: float64


**6. drop():** Removes specified rows or columns from the DataFrame.

In [28]:
modified_df = df.drop(index=[1, 3])
print(modified_df)

    Name  Age
0  HENIL   24
2   ABHI   25


**7. apply():** Applies a function along an axis of the DataFrame.

In [29]:
def double_age(age):
    return age * 2

df['Double_Age'] = df['Age'].apply(double_age)
print(df)

      Name  Age  Double_Age
0    HENIL   24          48
1  HANSRAJ   23          46
2     ABHI   25          50
3     YASH   22          44


**8. merge() and join():** Combines two or more DataFrames based on a common column.

In [30]:
left_df = pd.DataFrame({'ID': [1, 2, 3], 'Value': ['A', 'B', 'C']})
right_df = pd.DataFrame({'ID': [2, 3, 4], 'Value': ['X', 'Y', 'Z']})

merged_df = pd.merge(left_df, right_df, on='ID', how='inner')
print(merged_df)

   ID Value_x Value_y
0   2       B       X
1   3       C       Y


In [31]:
# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Among the options provided, both Series and DataFrame are mutable in nature, whereas Panel is not.

1. **Series**: A Pandas Series is mutable, meaning you can modify its values after it's created. You can change specific values, add new values, or delete existing values.

2. **DataFrame**: Similarly, a Pandas DataFrame is mutable. You can change cell values, add or remove columns, and manipulate the structure and content of the DataFrame.

3. **Panel**: However, the Panel data structure has been deprecated in recent versions of Pandas (0.25.0 and later) and is no longer recommended for use. It was designed to handle three-dimensional data, but for most use cases, multi-index DataFrames can be used instead. If you're using a newer version of Pandas, you won't find the Panel data structure, so it's not relevant to the latest versions.

In summary, Series and DataFrame are mutable, but the Panel data structure is no longer part of Pandas' recommended API.

In [32]:
# Q7. Create a DataFrame using multiple Series. Explain with an example.

Sure! You can create a DataFrame by combining multiple Pandas Series. Each Series will become a column in the resulting DataFrame. Here's an example:

In [33]:
import pandas as pd

names = pd.Series(['Henil', 'Hansraj', 'Abhi', 'Yash'])
ages = pd.Series([24, 23, 25, 22])
cities = pd.Series(['New York', 'Los Angeles', 'Chicago', 'Houston'])

data = {
    'Name': names,
    'Age': ages,
    'City': cities
}

df = pd.DataFrame(data)

print(df)

      Name  Age         City
0    Henil   24     New York
1  Hansraj   23  Los Angeles
2     Abhi   25      Chicago
3     Yash   22      Houston


In this example, we created three Pandas Series (**names**, **ages**, and **cities**) representing the names, ages, and cities of individuals. We then combined these Series into a DataFrame named **df** by creating a dictionary where the keys are the column names and the values are the corresponding Series. The resulting DataFrame has columns 'Name', 'Age', and 'City', each containing the values from the respective Series.