# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

In [2]:
series

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [3]:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
series = pd.Series(my_list)
print(series)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


# Q3. Create a Pandas DataFrame that contains the following data:
![data.JPG](attachment:1f223094-0b7b-4b67-8f5a-7cdbd9eb4ffe.JPG)
# Then, print the DataFrame.

In [4]:
data = {'Name': ['Alice','Bob','Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']}
df = pd.DataFrame(data)

In [5]:
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a DataFrame is a two-dimensional, size-mutable, and heterogeneous data structure that is widely used for data manipulation and analysis. It can be thought of as a table or a spreadsheet-like data structure where data is organized in rows and columns. Each column in a DataFrame represents a variable, while each row represents an observation or a record. This makes it suitable for handling structured data, like the data you might find in a CSV file or a database table.

On the other hand, a Series is a one-dimensional labeled array in pandas. It is similar to a Python list, but with additional capabilities and properties. A Series is designed to hold a single column of data and the corresponding row labels (indices). It is like a single column of a DataFrame.

Here's a more detailed explanation of the differences between a DataFrame and a Series:

##### Dimensionality:
1. DataFrame: It is a two-dimensional data structure, meaning it has both rows and columns. It can be represented as a table.
2. Series: It is a one-dimensional data structure, meaning it has only one axis (rows or columns).

##### Data Structure:
1. DataFrame: It can be visualized as a collection of Series where each Series represents a column, and all the Series share the same index.
2. Series: It represents a single column of data with an associated index.

##### Representation:
1. DataFrame: It is a tabular structure with rows and columns, similar to a spreadsheet or a SQL table.
2. Series: It is a single column with labels for each row, similar to a Python dictionary.


In [6]:
data = {'Name': ['Alice','Bob','Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']}
df = pd.DataFrame(data)

In [7]:
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


In [8]:
df['Age']

0    25
1    30
2    27
Name: Age, dtype: int64

# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data in a DataFrame. These functions are essential for data cleaning, filtering, transformation, aggregation, and more. Here are some common functions used for DataFrame manipulation, along with detailed explanations and an example for one of them:

##### head() and tail():
1. head(): Returns the first n rows of the DataFrame (default n=5).
2. tail(): Returns the last n rows of the DataFrame (default n=5).

In [9]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace'],
    'Age': [25, 30, 22, 28, 35, 40, 32]}

df = pd.DataFrame(data)

print("First 3 rows:")
print(df.head(3))

print("\nLast 2 rows:")
print(df.tail(2))

First 3 rows:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22

Last 2 rows:
    Name  Age
5  Frank   40
6  Grace   32


##### describe():
Generates descriptive statistics of the DataFrame, providing information like mean, median, minimum, maximum, and quartiles for numeric columns.

In [10]:
data = {'Age': [25, 30, 22, 28, 35, 40, 32]}

df = pd.DataFrame(data)

print(df.describe())

             Age
count   7.000000
mean   30.285714
std     6.074929
min    22.000000
25%    26.500000
50%    30.000000
75%    33.500000
max    40.000000


##### sort_values():
Sorts the DataFrame based on the values of one or more columns.

In [11]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 28]}

df = pd.DataFrame(data)

df_sorted = df.sort_values(by='Age')

print(df_sorted)

      Name  Age
2  Charlie   22
0    Alice   25
3    David   28
1      Bob   30


##### groupby() and aggregation functions:
1. groupby(): Groups the DataFrame based on the values of one or more columns.
2. Aggregation functions (e.g., sum(), mean(), count(), max(), min(), etc.): Perform operations on grouped data.

In [12]:
data = {'City': ['New York', 'London', 'Paris', 'New York', 'London'],
    'Population': [8500000, 8900000, 2200000, 8600000, 9000000]}

df = pd.DataFrame(data)

population_mean_by_city = df.groupby('City')['Population'].mean()

print(population_mean_by_city)

City
London      8950000.0
New York    8550000.0
Paris       2200000.0
Name: Population, dtype: float64


# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Among the options given, the mutable data structure is the DataFrame.

1. Series: A Series in pandas is an immutable, one-dimensional array-like object. Once a Series is created, you cannot change its size or modify its elements.

2. DataFrame: A DataFrame in pandas is a two-dimensional, size-mutable, and heterogeneous data structure. It is mutable, meaning you can modify its size, add or delete rows and columns, and update the values in the DataFrame.

3. Panel: Panels have been deprecated since pandas version 0.25.0 and are no longer recommended for use. They were three-dimensional data structures in pandas that were similar to DataFrames but had an extra dimension. Like DataFrames, Panels were also mutable.

# Q7. Create a DataFrame using multiple Series. Explain with an example.

In [13]:
name_series = pd.Series(['Alice', 'Bob', 'Charlie', 'David'])
age_series = pd.Series([25, 30, 22, 28])
city_series = pd.Series(['New York', 'London', 'Paris', 'Tokyo'])

df1 = pd.DataFrame({
    'Name': name_series,
    'Age': age_series,
    'City': city_series})

print(df)

       City  Population
0  New York     8500000
1    London     8900000
2     Paris     2200000
3  New York     8600000
4    London     9000000


In above example, we created three separate Series: name_series, age_series, and city_series. Each Series represents a column of data that we want to include in the DataFrame. We then used the pd.DataFrame() constructor to combine these Series into a DataFrame, where each Series became a column in the DataFrame.

As we can see from the output, the resulting DataFrame has three columns: 'Name', 'Age', and 'City'. Each row represents an observation or a record, and the data from the corresponding Series is placed in the respective column.

By creating DataFrames from multiple Series, we can efficiently organize and manipulate our data in a structured manner, making it easier to perform data analysis and other operations