Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd
data=[4,8,15,16,23,42]
series=pd.Series(data)
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [3]:
import pandas as pd
my_list=[10,20,30,40,50,60,70,80,90,100]
series=pd.Series(my_list)
print(series)

0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:
Name
Alice
Bob
Claire

Age
25
30
27

Gender
Female
Male
Female

Then, print the DataFrame.

In [8]:
import pandas as pd
data={
    'name':['Alice','Bob','claire'],
    'Age':[25,30,27],
    'Gender':['Female','male','Female']
}

df=pd.DataFrame(data)
print(df)

     name  Age  Gender
0   Alice   25  Female
1     Bob   30    male
2  claire   27  Female


Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.
answer:
In Pandas, a **DataFrame** is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a collection of Pandas Series objects where each Series represents a column, and the columns are aligned by a common index. DataFrames are one of the most commonly used data structures in Pandas and are suitable for handling structured data, such as data in CSV files, Excel spreadsheets, or SQL tables.

A **Pandas Series**, on the other hand, is a one-dimensional labeled array capable of holding data of any type. It is like a column in a DataFrame. A Series has a one-dimensional structure with an associated index, which can be thought of as row labels. 

Here's an example to illustrate the difference between a DataFrame and a Series:

```python
import pandas as pd

# Creating a Pandas Series
ser = pd.Series([1, 2, 3, 4], name='MySeries')
print("Pandas Series:")
print(ser)
```

Output:

```
Pandas Series:
0    1
1    2
2    3
3    4
Name: MySeries, dtype: int64
```

In this example, we created a Pandas Series named 'MySeries' with values 1, 2, 3, and 4. It's a one-dimensional structure with an index (0, 1, 2, 3).

Now, let's create a DataFrame:

```python
# Creating a Pandas DataFrame
data = {'Column1': [1, 2, 3, 4],
        'Column2': ['A', 'B', 'C', 'D']}
df = pd.DataFrame(data)
print("\nPandas DataFrame:")
print(df)
```

Output:

```
Pandas DataFrame:
   Column1 Column2
0        1       A
1        2       B
2        3       C
3        4       D
```

In this example, we created a Pandas DataFrame with two columns, 'Column1' and 'Column2'. It's a two-dimensional structure where 'Column1' is a Pandas Series containing numeric data, and 'Column2' is a Pandas Series containing string data. The DataFrame has an index (0, 1, 2, 3) for rows, and each column has its own label.

In summary, a Pandas Series is a one-dimensional data structure, whereas a Pandas DataFrame is a two-dimensional data structure that can hold multiple columns, making it suitable for handling structured data with multiple attributes.

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data in a DataFrame. Some common functions and methods include:

1. **Selection and Filtering**:
   - `df['column_name']` or `df.column_name`: Select a single column.
   - `df[['col1', 'col2']]`: Select multiple columns.
   - `df.loc[row_index]` or `df.iloc[row_position]`: Select rows by label or integer position, respectively.
   - `df[df['column'] > 5]`: Filter rows based on a condition.

2. **Sorting**:
   - `df.sort_values(by='column_name')`: Sort DataFrame by column.
   - `df.sort_index()`: Sort by index.
   
3. **Aggregation and Summary Statistics**:
   - `df.groupby('column_name').mean()`: Compute mean by groups.
   - `df['column_name'].sum()`: Calculate the sum of a column.
   - `df.describe()`: Get summary statistics.

4. **Data Cleaning**:
   - `df.dropna()`: Remove rows with missing values.
   - `df.fillna(value)`: Fill missing values with a specified value.
   - `df.drop(columns='column_name')`: Drop a column.

5. **Merging and Joining Data**:
   - `pd.concat([df1, df2])`: Concatenate DataFrames.
   - `pd.merge(df1, df2, on='key_column')`: Merge DataFrames based on a common column.

6. **Data Transformation**:
   - `df.apply(function)`: Apply a function to each element or row.
   - `df.pivot(index='row_column', columns='column_column', values='value_column')`: Pivot the DataFrame.
   - `df.rename(columns={'old_name': 'new_name'})`: Rename columns.

7. **Data Visualization**:
   - `df.plot(kind='plot_type')`: Create various plots, such as line, bar, or scatter plots.

8. **Data Export/Import**:
   - `df.to_csv('filename.csv')`: Export DataFrame to a CSV file.
   - `pd.read_csv('filename.csv')`: Import data from a CSV file.

Here's an example of when you might use one of these functions. Let's say you have a DataFrame containing sales data, and you want to calculate the total sales for each product category:

```python
import pandas as pd

data = {
    'Product': ['A', 'B', 'A', 'C', 'B', 'C'],
    'Sales': [100, 200, 150, 120, 250, 80]
}

df = pd.DataFrame(data)

# Group by 'Product' and calculate the total sales for each category
total_sales_by_category = df.groupby('Product')['Sales'].sum()

print(total_sales_by_category)
```

Output:

```
Product
A    250
B    450
C    200
Name: Sales, dtype: int64
```

In this example, we used the `groupby()` function to group the data by the 'Product' column and then applied the `sum()` function to calculate the total sales for each product category. This demonstrates the use of aggregation and grouping functions in Pandas to manipulate and analyze data in a DataFrame.

Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

**Series** and **DataFrame** in Pandas are mutable in nature, meaning you can change their values, add or remove columns or rows, and modify their contents after creation. 

However, please note that the concept of **Panel** was deprecated in Pandas starting from version 0.25.0, and it's no longer a recommended data structure for handling multi-dimensional data. Instead, Pandas encourages the use of MultiIndex DataFrames to represent multi-dimensional data, and DataFrames themselves can be considered as a two-dimensional generalization of Series.

So, if you are working with Pandas in versions 0.25.0 or later, you won't typically use the Panel data structure, and it's not relevant to most data manipulation tasks. Instead, you would work with Series and DataFrames, both of which are mutable.

Q7. Create a DataFrame using multiple Series. Explain with an example.

You can create a Pandas DataFrame using multiple Series by passing those Series as a dictionary to the `pd.DataFrame()` constructor. Each Series will become a column in the resulting DataFrame. Here's an example to illustrate this:

```python
import pandas as pd

# Creating multiple Series
name_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])

# Creating a DataFrame using the Series
data = {
    'Name': name_series,
    'Age': age_series,
    'Gender': gender_series
}

df = pd.DataFrame(data)

# Print the resulting DataFrame
print(df)
```

Output:

```
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
```

In this example, we first create three Pandas Series: `name_series`, `age_series`, and `gender_series`. Each Series represents a different attribute of individuals (Name, Age, and Gender).

Then, we create a DataFrame `df` by passing these Series as a dictionary to the `pd.DataFrame()` constructor. Each Series becomes a column in the DataFrame, and the resulting DataFrame has columns labeled 'Name,' 'Age,' and 'Gender' with corresponding values from the Series.

This is a common way to create a DataFrame when you have multiple Series representing different attributes of your data, and you want to combine them into a single structured DataFrame.