In [1]:
import pandas as pd

#### Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [4]:
a = pd.Series([4,8,15,16,23,42])
print(a)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [7]:
type(a)

pandas.core.series.Series

#### Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [6]:
b = [1, "a", "is", "pwskills", "10", 23, 45, 24, "ineuron", "mere laptop purana hai" ]
B = pd.Series(b)
print(B)

0                         1
1                         a
2                        is
3                  pwskills
4                        10
5                        23
6                        45
7                        24
8                   ineuron
9    mere laptop purana hai
dtype: object


In [8]:
type(B)

pandas.core.series.Series

#### Q3. Create a Pandas DataFrame that contains the following data:![Screenshot (284).png](attachment:cb092adf-746b-49ab-811b-8890ca6dd474.png) Then, print the DataFrame.

In [12]:
data = {'Name' : ['Alice', 'Bob', 'Claire'],
     'Age' : [25, 30, 27],
     'Gender' : [' Female ', ' Male ' , ' Female ']}

df = pd.DataFrame(data)
print(df, type(df))

     Name  Age    Gender
0   Alice   25   Female 
1     Bob   30     Male 
2  Claire   27   Female  <class 'pandas.core.frame.DataFrame'>


#### Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a **DataFrame** is a two-dimensional, tabular data structure that is designed to store and manipulate heterogeneous data. It is similar to a spreadsheet or SQL table, where data is organized in rows and columns. Each column in a DataFrame is a Pandas Series, and they share many properties. However, a DataFrame can be thought of as a container for multiple Series.

Here's a brief comparison between a DataFrame and a Series:

1. **DataFrame:**
   - Two-dimensional table of data.
   - Consists of rows and columns.
   - Can store data of different data types in different columns.
   - Suitable for handling complex, structured data.

2. **Series:**
   - One-dimensional array.
   - Similar to a column in a DataFrame.
   - Homogeneous data structure (all elements have the same data type).
   - Useful for handling one-dimensional data.

Let's illustrate the difference with an example:

```python
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}

df = pd.DataFrame(data)

# Creating a Series
ages_series = pd.Series([25, 30, 22], name='Age')

# Printing the DataFrame and Series
print("DataFrame:")
print(df)
print("\nSeries:")
print(ages_series)
```

In this example, the `df` DataFrame has three columns ('Name', 'Age', 'City'), where each column is a Pandas Series. The `ages_series` is a standalone Series representing the 'Age' column. The DataFrame provides a structure to organize multiple Series together, making it more convenient for handling and analyzing tabular data.

#### Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate and analyze data in a DataFrame. Here are some common functions:

1. **head() and tail():**
   - **Example:** `df.head()` returns the first 5 rows of the DataFrame. Useful for quickly inspecting the structure of the data.

2. **info():**
   - **Example:** `df.info()` provides a concise summary of the DataFrame, including data types and missing values. Helpful for understanding the data types and identifying missing values.

3. **describe():**
   - **Example:** `df.describe()` generates descriptive statistics of the DataFrame, such as mean, standard deviation, minimum, and maximum values. Useful for understanding the distribution of numerical data.

4. **loc[] and iloc[]:**
   - **Example:** `df.loc[:, 'ColumnName']` selects all rows for a specific column using the column name. `df.iloc[:, 0]` selects all rows for the first column using the column index. Useful for indexing and selecting specific data.

5. **drop():**
   - **Example:** `df.drop('ColumnName', axis=1)` drops a specific column from the DataFrame. Helpful for removing unnecessary columns.

6. **groupby():**
   - **Example:** `df.groupby('Category')['Column'].mean()` groups the DataFrame by a specific category and calculates the mean of a particular column for each group. Useful for aggregating and summarizing data.

7. **sort_values():**
   - **Example:** `df.sort_values(by='Column', ascending=False)` sorts the DataFrame based on values in a specific column in descending order. Useful for sorting data.

8. **fillna():**
   - **Example:** `df.fillna(value)` fills missing values in the DataFrame with a specified value. Helpful for handling missing data.

9. **apply():**
   - **Example:** `df['Column'].apply(lambda x: x * 2)` applies a custom function to each element in a column. Useful for applying a transformation to the data.

10. **merge():**
    - **Example:** `pd.merge(df1, df2, on='KeyColumn')` merges two DataFrames based on a common column. Useful for combining data from different sources.

#### Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In Pandas, among Series, DataFrame, and Panel, the only mutable data structure is the **DataFrame**. Both Series and Panel are immutable.

- **Series:** Immutable, meaning you cannot change the values of elements in a Series after it has been created.

- **DataFrame:** Mutable, meaning you can modify the content of a DataFrame. You can add or remove columns, update values, and perform various manipulations.

- **Panel:** While Panel was a data structure in earlier versions of Pandas, it has been deprecated. MultiIndex DataFrames are now recommended to handle 3D data.

So, to summarize, DataFrame is the mutable data structure among the options provided.

#### Q7. Create a DataFrame using multiple Series. Explain with an example.

Certainly! We can create a DataFrame using multiple Series by combining them into a dictionary where each key-value pair represents a column name and the corresponding Series. Here's an example:

```python
import pandas as pd

# Creating multiple Series
name_series = pd.Series(['Alice', 'Bob', 'Charlie'], name='Name')
age_series = pd.Series([25, 30, 22], name='Age')
city_series = pd.Series(['New York', 'San Francisco', 'Los Angeles'], name='City')

# Creating a DataFrame using the Series
data = {'Name': name_series, 'Age': age_series, 'City': city_series}
df = pd.DataFrame(data)

# Displaying the DataFrame
print(df)
```

**Output:**
```
      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   22    Los Angeles
```

In this example, we create three Series (`name_series`, `age_series`, and `city_series`), each representing a column of the DataFrame. Then, we combine these Series into a dictionary called `data`, where the keys are the column names ('Name', 'Age', 'City') and the values are the corresponding Series. Finally, we use `pd.DataFrame(data)` to create a DataFrame from the dictionary.

The resulting DataFrame has three columns: 'Name', 'Age', and 'City', and each column is populated with the values from the respective Series.