# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd

In [6]:
pandas_series = pd.Series([4,8,15,16,23,42])
print(pandas_series)
print(type(pandas_series))

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64
<class 'pandas.core.series.Series'>


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [10]:
ls = [1,2,3,4,5,6,7,8,9,'hello']
pd_series = pd.Series(ls)
print(pd_series)
print(type(pd_series))

0        1
1        2
2        3
3        4
4        5
5        6
6        7
7        8
8        9
9    hello
dtype: object
<class 'pandas.core.series.Series'>


# Q3. Create a Pandas DataFrame that contains the following data: Then, print the DataFrame.

In [12]:
data = [['Alice',25,'Female'],['Bob',30,'Male'],['Claire',27,'Female']]

df = pd.DataFrame(data, columns = ['Name','Age','Gender'])

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


In [13]:
data = {'Name':['Alice','Bob','Claire'],'Age':[25,30,27],'Gender':['Female','Male','Female']}

df = pd.DataFrame(data)

print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

- In pandas, a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a table in a database or an Excel spreadsheet. On the other hand, a Series is a one-dimensional labeled array capable of holding any data type (integer, string, float, etc.).

Here are the key differences:

1) Dimension:

- Series: One-dimensional.
- DataFrame: Two-dimensional.

2) Structure:

- Series: Similar to a single column of data.
- DataFrame: A collection of Series objects, which can be considered as a table with multiple columns.

3) Indexing:

- Series: Has a single axis (axis=0).
- DataFrame: Has two axes (axis=0 for rows and axis=1 for columns

4) Usage:

- Series: Used when dealing with single-dimensional data, like a list.
- DataFrame: Used when dealing with two-dimensional data, like a table.

In [17]:
# Example:
# Creating a Series:
data_series = pd.Series(data = [1,2,3,4,5], index = ['a','b','c','d','e'])
print('Series: ')
print(data_series)


Series: 
a    1
b    2
c    3
d    4
e    5
dtype: int64


In [18]:
# Creating a DataFrame
data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}
data_frame = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4'])
print("\nDataFrame:")
print(data_frame)


DataFrame:
      A  B   C
row1  1  5   9
row2  2  6  10
row3  3  7  11
row4  4  8  12


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Canyou give an example of when you might use one of these functions?

- Pandas provides a rich set of functions to manipulate data within a DataFrame. Here are some common functions and their uses:

Common Functions:

1) Head and Tail:

- df.head(n): Returns the first n rows of the DataFrame.
- df.tail(n): Returns the last n rows of the DataFrame.

2) Describe:

- df.describe(): Generates descriptive statistics of the DataFrame.

3) Info:

- df.info(): Provides a concise summary of the DataFrame, including the index dtype and columns, non-null values, and memory usage.

4) Shape:

- df.shape: Returns a tuple representing the dimensionality of the DataFrame (number of rows, number of columns).

5) Isnull:

- df.isnull(): Detects missing values, returning a DataFrame of the same shape with boolean values indicating the presence of null values.

6) Dropna:

- df.dropna(): Removes rows or columns with missing values.

7) Fillna:

- df.fillna(value): Fills missing values with the specified value.

8) Sort Values:

- df.sort_values(by, ascending=True): Sorts the DataFrame by the specified column(s).

9) Groupby:

- df.groupby(by): Groups the DataFrame using a mapper or by a Series of columns.

10) Merge and Join:

- pd.merge(left, right, how, on): Merges two DataFrames using database-style join operations.
- df.join(other, on): Joins columns with other DataFrame.

In [19]:
import pandas as pd

# Sample data
data = {
    'Product': ['A', 'B', 'A', 'B', 'C', 'A', 'C', 'B'],
    'Sales': [100, 200, None, 150, 300, None, 350, 400],
    'Quantity': [1, 2, 1, 1, 3, 1, 2, 3]
}

# Creating DataFrame
df = pd.DataFrame(data)

In [21]:
# 1. Display the first few rows
print("Head of the DataFrame:")
print(df.head())

# 2. Get summary statistics
print("\nDescriptive statistics:")
print(df.describe())

# 3. Check for missing values
print("\nCheck for missing values:")
print(df.isnull())

# 4. Fill missing values in 'Sales' with the mean value of 'Sales'
df['Sales'] = df['Sales'].fillna(df['Sales'].mean())
print("\nDataFrame after filling missing values:")
print(df)

# 5. Group by 'Product' and calculate total sales
grouped_df = df.groupby('Product').sum()
print("\nTotal sales by product:")
print(grouped_df)

# 6. Sort values by 'Sales'
sorted_df = grouped_df.sort_values(by='Sales', ascending=False)
print("\nSorted total sales by product:")
print(sorted_df)

Head of the DataFrame:
  Product  Sales  Quantity
0       A  100.0         1
1       B  200.0         2
2       A    NaN         1
3       B  150.0         1
4       C  300.0         3

Descriptive statistics:
            Sales  Quantity
count    6.000000  8.000000
mean   250.000000  1.750000
std    118.321596  0.886405
min    100.000000  1.000000
25%    162.500000  1.000000
50%    250.000000  1.500000
75%    337.500000  2.250000
max    400.000000  3.000000

Check for missing values:
   Product  Sales  Quantity
0    False  False     False
1    False  False     False
2    False   True     False
3    False  False     False
4    False  False     False
5    False   True     False
6    False  False     False
7    False  False     False

DataFrame after filling missing values:
  Product  Sales  Quantity
0       A  100.0         1
1       B  200.0         2
2       A  250.0         1
3       B  150.0         1
4       C  300.0         3
5       A  250.0         1
6       C  350.0         2
7 

# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

- In pandas, both Series and DataFrame are mutable, whereas Panel is not commonly used anymore and has been deprecated since pandas version 0.25.0 in favor of using multi-indexed DataFrames

In [22]:
# Mutable Nature of pandas Structures
# Series: A one-dimensional array with axis labels. It is mutable, meaning you can change its elements, indices, and even add or remove elements.
import pandas as pd

# Creating a Series
s = pd.Series([1, 2, 3, 4])
print("Original Series:")
print(s)

# Modifying an element
s[1] = 5
print("\nModified Series:")
print(s)


Original Series:
0    1
1    2
2    3
3    4
dtype: int64

Modified Series:
0    1
1    5
2    3
3    4
dtype: int64


In [23]:
# DataFrame: A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is mutable, allowing changes to elements, rows, columns, and even the structure of the DataFrame itself
# Creating a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print("Original DataFrame:")
print(df)

# Modifying an element
df.at[0, 'A'] = 10
print("\nModified DataFrame:")
print(df)


Original DataFrame:
   A  B
0  1  4
1  2  5
2  3  6

Modified DataFrame:
    A  B
0  10  4
1   2  5
2   3  6


- Deprecation of Panel
Panel: A three-dimensional data structure that was mutable but has been deprecated. Users are encouraged to use MultiIndex DataFrames for three-dimensional data.

# Q7. Create a DataFrame using multiple Series. Explain with an example.

- Creating a DataFrame using multiple Series in pandas involves combining these Series into a two-dimensional data structure. Each Series can be thought of as a column in the resulting DataFrame. Here’s an example to illustrate this process:

Let's create a DataFrame using three different Series, representing student data.

- Creating Series:

names: Series containing student names.
ages: Series containing student ages.
grades: Series containing student grades.
Combining Series into a DataFrame:

- Use pd.DataFrame() to combine these Series into a DataFrame.

In [24]:
import pandas as pd

# Creating Series
names = pd.Series(['Alice', 'Bob', 'Charlie', 'David'])
ages = pd.Series([24, 27, 22, 23])
grades = pd.Series(['A', 'B', 'A', 'B'])

# Combining Series into a DataFrame
student_data = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'Grade': grades
})

print("DataFrame:")
print(student_data)


DataFrame:
      Name  Age Grade
0    Alice   24     A
1      Bob   27     B
2  Charlie   22     A
3    David   23     B
