# Task 1: Series and DataFrame Basics

### 1.Series Creation:
- #### Create a Series containing five stock prices with custom index labels for company tickers (e.g., 'AAPL', 'GOOG', etc.).
- #### Print its index, values, and dtype.


In [3]:
import numpy as np
import pandas as pd
stocks = pd.Series([210, 91, 35,300,100] , index=['AAPL', 'GOOG' ,' TSLA', 'AMZN', 'JPN'])
print(stocks.index)
print(stocks.array)
print(stocks.dtype)

Index(['AAPL', 'GOOG', ' TSLA', 'AMZN', 'JPN'], dtype='object')
<PandasArray>
[210, 91, 35, 300, 100]
Length: 5, dtype: int64
int64


### 2.DataFrame Creation:

- #### Construct a DataFrame using a dictionary of lists, with columns: 'Name', 'Department', 'Salary'.
- #### Set custom row labels for employee IDs (e.g., 'EMP001', ...).

In [106]:
data = {"Name":["Krishna","Mani","Bhagya","Satish","Vijay"], "Department":["Accounts","Managment","Data_analysist","Team_lead","Software_dev"], "Salary":[70000,80000,90000,100000,110000]}
frame = pd.DataFrame(data, index = ["EMP001", "EMP002", "EMP003", "EMP004","EMP005"])
print(frame)

           Name      Department  Salary
EMP001  Krishna        Accounts   70000
EMP002     Mani       Managment   80000
EMP003   Bhagya  Data_analysist   90000
EMP004   Satish       Team_lead  100000
EMP005    Vijay    Software_dev  110000


### 3.Inspect the Data:

- #### Use .head(), .tail(), .shape, and .dtypes to inspect the DataFrame.

In [18]:
print(frame.head())
frame.tail()
print(f"shape of frame is {frame.shape}")
frame.dtypes

           Name      Department  Salary
EMP001  Krishna        Accounts    7000
EMP002     Mani       Managment    8000
EMP003   Bhagya  Data_analysist    9000
EMP004   Satish       Team_lead   10000
EMP005    Vijay    Software_dev   11000
shape of frame is (5, 3)


Name          object
Department    object
Salary         int64
dtype: object

# Task 2: Reindexing and Dropping

### 1.Reindex the DataFrame to include an extra row index 'EMP999' and use forward fill (ffill) to populate it.

In [19]:
frame.reindex(index = ["EMP001", "EMP002", "EMP003", "EMP004","EMP005", "EMP999"], method="ffill")

Unnamed: 0,Name,Department,Salary
EMP001,Krishna,Accounts,7000
EMP002,Mani,Managment,8000
EMP003,Bhagya,Data_analysist,9000
EMP004,Satish,Team_lead,10000
EMP005,Vijay,Software_dev,11000
EMP999,Vijay,Software_dev,11000


### 2.Drop:

- #### The 'Department' column.
- #### The row with employee ID 'EMP003'.

In [23]:
frame.drop(columns ="Department" , index = ["EMP003"])

Unnamed: 0,Name,Salary
EMP001,Krishna,7000
EMP002,Mani,8000
EMP004,Satish,10000
EMP005,Vijay,11000


# Task 3: Selection and Filtering

### 1.Use .loc[] to select:

- #### The 'Name' and 'Salary' for 'EMP002'.

In [25]:
frame.loc["EMP002",["Name","Salary"]]

Name      Mani
Salary    8000
Name: EMP002, dtype: object

### 2.Use .iloc[] to select:

- #### The first 3 rows.

In [26]:
frame.iloc[:3]

Unnamed: 0,Name,Department,Salary
EMP001,Krishna,Accounts,7000
EMP002,Mani,Managment,8000
EMP003,Bhagya,Data_analysist,9000


### 3.Filter the DataFrame to find:

- #### Employees with salary above 60,000.
- #### Employees in departments matching a list of ['IT', 'Finance'] using isin().

In [28]:
frame[frame["Salary"] > 6000]

Unnamed: 0,Name,Department,Salary
EMP001,Krishna,Accounts,7000
EMP002,Mani,Managment,8000
EMP003,Bhagya,Data_analysist,9000
EMP004,Satish,Team_lead,10000
EMP005,Vijay,Software_dev,11000


In [31]:
# The dataFrame i created , there are no repetative roles in department column so im using the roles i created in dataframe department column
filtered_frame = frame[frame["Department"].isin(["accounts","Data_analysist","Team_lead","Software_dev"])]
print(filtered_frame)

          Name      Department  Salary
EMP003  Bhagya  Data_analysist    9000
EMP004  Satish       Team_lead   10000
EMP005   Vijay    Software_dev   11000


# Task 4: Arithmetic and Alignment

### 1.Create a Series of bonuses indexed by employee IDs, with one ID missing and one extra.

In [33]:
bonus = pd.Series([500,2000,4000,5000,6000], index = ["EMP001", "EMP002","EMP004","EMP005","EMP006"])
bonus

EMP001     500
EMP002    2000
EMP004    4000
EMP005    5000
EMP006    6000
dtype: int64

### 2.Add the bonus Series to the 'Salary' column and observe alignment. Use fill_value=0 to replace NaN with 0.

In [48]:
increment_frame = frame.loc[:,"Salary"].add(bonus, fill_value = 0)
increment_frame

EMP001     7500.0
EMP002    10000.0
EMP003     9000.0
EMP004    14000.0
EMP005    16000.0
EMP006     6000.0
dtype: float64

### 3.Calculate the percentage increase using broadcasting.

In [68]:
percentage_increase = (increment_frame/frame.loc[:,"Salary"]) *100
np.rint(percentage_increase)

EMP001     11.0
EMP002     12.0
EMP003     10.0
EMP004    140.0
EMP005     15.0
EMP006      NaN
dtype: float64

# Task 5: Function Application

### 1.Apply a lambda function to:

- #### Convert all employee names to uppercase.


In [95]:
case_change = frame.index.map(lambda x: x.upper())
case_change

Index(['EMP001', 'EMP002', 'EMP003', 'EMP004', 'EMP005'], dtype='object')

### 2.Use .apply() to:

- #### Create a new column 'Net Salary' by applying a tax deduction of 10% to the 'Salary'.

In [107]:
frame["Net Salary"] = frame["Salary"].apply(lambda x: x - (0.10 * x))
frame

Unnamed: 0,Name,Department,Salary,Net Salary
EMP001,Krishna,Accounts,70000,63000.0
EMP002,Mani,Managment,80000,72000.0
EMP003,Bhagya,Data_analysist,90000,81000.0
EMP004,Satish,Team_lead,100000,90000.0
EMP005,Vijay,Software_dev,110000,99000.0


# Task 6: Sorting and Ranking

### 1.Sort the DataFrame by:

- #### Index (row labels).
- #### Salary in descending order.

In [108]:
frame.sort_index()

Unnamed: 0,Name,Department,Salary,Net Salary
EMP001,Krishna,Accounts,70000,63000.0
EMP002,Mani,Managment,80000,72000.0
EMP003,Bhagya,Data_analysist,90000,81000.0
EMP004,Satish,Team_lead,100000,90000.0
EMP005,Vijay,Software_dev,110000,99000.0


In [110]:
frame.sort_values(["Salary"] ,ascending = False)

Unnamed: 0,Name,Department,Salary,Net Salary
EMP005,Vijay,Software_dev,110000,99000.0
EMP004,Satish,Team_lead,100000,90000.0
EMP003,Bhagya,Data_analysist,90000,81000.0
EMP002,Mani,Managment,80000,72000.0
EMP001,Krishna,Accounts,70000,63000.0


### 2. Add a 'Salary Rank' column using .rank() (use method='min' for tie breaking).

In [120]:
frame["Salary Rank"] = frame["Salary"].rank( method= "min")
frame

Unnamed: 0,Name,Department,Salary,Net Salary,Salary Rank
EMP001,Krishna,Accounts,70000,63000.0,1.0
EMP002,Mani,Managment,80000,72000.0,2.0
EMP003,Bhagya,Data_analysist,90000,81000.0,3.0
EMP004,Satish,Team_lead,100000,90000.0,4.0
EMP005,Vijay,Software_dev,110000,99000.0,5.0


# Task 7: Duplicate Indexes

### 1.Create a Series with duplicate index labels.

In [124]:
obj = pd.Series(np.arange(6), index = ["a", "b", "b", "c", "a", "d"])

### 2.Use .loc[] on the Series to retrieve all values for a duplicate index.

In [127]:
obj.loc[["a","b"]]

a    0
a    4
b    1
b    2
dtype: int64

### 3.Comment on what happens when applying .mean() and .sum() on this Series.

In [128]:
obj.sum()


15

In [129]:
obj.mean()
# Here sum amd mean extract a single value from a series by using summary statistics

2.5

# Task 8: Descriptive and Statistical Summary

### 1.Use .describe() on the numeric columns.

In [130]:
frame[["Salary","Net Salary", "Salary Rank"]].describe()

Unnamed: 0,Salary,Net Salary,Salary Rank
count,5.0,5.0,5.0
mean,90000.0,81000.0,3.0
std,15811.388301,14230.249471,1.581139
min,70000.0,63000.0,1.0
25%,80000.0,72000.0,2.0
50%,90000.0,81000.0,3.0
75%,100000.0,90000.0,4.0
max,110000.0,99000.0,5.0


### 2.Calculate:

- #### Mean, standard deviation, and cumulative sum of the 'Salary' column.
- #### Index label with the maximum and minimum salary using .idxmax() / .idxmin().

In [135]:
frame["Salary"].mean()
frame["Salary"].std()
frame["Salary"].cumsum()

EMP001     70000
EMP002    150000
EMP003    240000
EMP004    340000
EMP005    450000
Name: Salary, dtype: int64

In [137]:
frame["Salary"].idxmax()
frame["Salary"].idxmin()

'EMP001'

# Task 9: Correlation and Categorical Analysis

### 1.Create a new DataFrame with two numerical columns (e.g., 'Experience', 'Salary').

In [144]:
data2 = {"Experience":[1,2,3,4,5], "Salary":[ 20000,3000,40000,50000,60000]}
frame2 = pd.DataFrame(data2)
frame2


Unnamed: 0,Experience,Salary
0,1,20000
1,2,3000
2,3,40000
3,4,50000
4,5,60000


### 2.Compute:

- #### Correlation and covariance matrix using .corr() and .cov().

In [145]:
frame2.corr()

Unnamed: 0,Experience,Salary
Experience,1.0,0.871582
Salary,0.871582,1.0


In [146]:
frame2.cov()

Unnamed: 0,Experience,Salary
Experience,2.5,31750.0
Salary,31750.0,530800000.0


- #### Use .value_counts() to:

#### Count the frequency of unique values in a categorical column (e.g., 'Department').

In [148]:
# Here im using customized series as the dataframe does not have any duplicates in column Department
series = pd.Series(["a","a","a","d","f","g","g"])
series.unique()

array(['a', 'd', 'f', 'g'], dtype=object)