## 1

In [3]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


## 2

In [4]:
import pandas as pd
my_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
my_series = pd.Series(my_list)
print(my_series)


0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


## 3

In [5]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## 4

In Pandas, both DataFrame and Series are fundamental data structures for handling and analyzing tabular data, but they have different characteristics and use cases:

DataFrame:
- A DataFrame is a two-dimensional, labeled data structure that resembles a table or spreadsheet with rows and columns.
- It can store heterogeneous data types in different columns.
- Each column in a DataFrame is essentially a Pandas Series.
- DataFrames are useful for working with structured, tabular data where we need to perform operations on multiple columns simultaneously.
- DataFrames can be thought of as a collection of Pandas Series that share a common index.
- They have both row and column labels, making it easy to select, filter, and manipulate data.

In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


Series:
- A Series is a one-dimensional labeled array that can hold data of any data type (e.g., integers, strings, floats, etc.).
- It is like a single column from a DataFrame.
- Series are useful for representing and manipulating one-dimensional data.
- They have a label (index) associated with each data point.

In [2]:
import pandas as pd

ages = pd.Series([25, 30, 27])
print(ages)

0    25
1    30
2    27
dtype: int64


In this example, `ages` is a Pandas Series with integer values and an automatically generated numeric index.

In summary, while both DataFrames and Series are essential components of Pandas, DataFrames are used for two-dimensional tabular data with multiple columns, whereas Series are used for one-dimensional data, such as a single column or row from a DataFrame. DataFrames can be thought of as collections of Series that share the same index, and they provide powerful tools for data manipulation and analysis on structured data.

## 5

Pandas is a popular Python library for data manipulation and analysis, and it provides a wide range of functions and methods to manipulate data within a DataFrame. Here are some common functions we can use to manipulate data in a Pandas DataFrame, along with examples of when we might use them:

1.Filtering Data
`df[df['column_name'] > value]`: Use this to filter rows based on a condition. For example, filtering a DataFrame to include only rows where a specific column's values are greater than a certain value.

In [5]:
import pandas as pd
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['A'] > 2]
print(filtered_df)

   A  B
2  3  7
3  4  8



2.Sorting Data:
df.sort_values(by='column_name'): Sort the DataFrame by the values in a specific column.

   sorted_df = df.sort_values(by='A')


3.Grouping Data:
df.groupby('column_name'): Group rows based on unique values in a specific column. We can then apply aggregation functions to each group.


   grouped = df.groupby('B')
   avg_values = grouped['A'].mean()
 

4.Aggregating Data:
df.groupby('column_name').agg({'agg_column': 'agg_function'}): Calculate aggregate statistics for groups created using `groupby`. For example, finding the mean of a column for each group.

agg_df = df.groupby('B').agg({'A': 'mean'})


5.Merging and Joining Data:
pd.merge(df1, df2, on='key_column'): Merge two DataFrames based on a common key column.
merged_df = pd.merge(df1, df2, on='key_column')

6.Pivoting Data:
df.pivot(index='index_column', columns='column_to_pivot'): Reshape data by pivoting rows into columns based on index and column values.

   pivoted_df = df.pivot(index='A', columns='B', values='C')


7.Handling Missing Data:
df.dropna(): Remove rows with missing values.
df.fillna(value): Replace missing values with a specified value.

   cleaned_df = df.dropna()
   filled_df = df.fillna(0)


8.Applying Functions:
df.apply(func): Apply a function to each element or row/column of the DataFrame.

   def square(x):
       return x ** 2

   squared_values = df['A'].apply(square)

9.Reshaping Data:
 pd.melt(df, id_vars=['id_columns'], value_vars=['value_columns']): Unpivot a DataFrame from wide to long format.

   melted_df = pd.melt(df, id_vars=['ID'], value_vars=['Var1', 'Var2'])

## 6

Series and Dataframe are mutable.A pandas Series is mutable, which means we can modify its elements after it is created. We can change values at specific indices, add or remove elements, and perform various operations to alter the Series and pandas DataFrame is also mutable. We can modify its columns, add or remove rows or columns, update values, and perform various data manipulation tasks on it.

## 7

In [6]:
import pandas as pd

s1 = pd.Series([1, 2, 3, 4], name='A')
s2 = pd.Series(['apple', 'banana', 'cherry', 'date'], name='B')
s3 = pd.Series([10.5, 20.2, 5.0, 15.8], name='C')

# Combine the Series into a DataFrame
df = pd.DataFrame({'Column_A': s1, 'Column_B': s2, 'Column_C': s3})


print(df)


   Column_A Column_B  Column_C
0         1    apple      10.5
1         2   banana      20.2
2         3   cherry       5.0
3         4     date      15.8
