# Built-in Data Types to store Collections of Data

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20191023173512/Python-data-structure.jpg" alt="Python Data-Types" />

Collections in python are basically container data types, namely lists, sets, tuples, dictionary.<BR>
They have different characteristics based on the declaration and the usage.<BR>
<ul>
    <li>A list is declared in square brackets, it is mutable, stores duplicate values and elements can be accessed using indexes.</li>
    <li>A tuple is ordered and immutable in nature, although duplicate entries can be there inside a tuple.</li>
    <li>A set is unordered and declared in square brackets. It is not indexed and does not have duplicate entries as well.</li>
    <li>A dictionary has key value pairs and is mutable in nature. We use square brackets to declare a dictionary.</li>
</ul>

<U><H1>SETS</H1></U>

Sets are used to store multiple items in a single variable.<BR>
Set items are :- 
- unordered, 
- unchangeable, 
- do not allow duplicate values.

Sets are written with curly brackets.

In [3]:
dataset = {1,2,3,4}
print("dataset -> ",dataset,"   ",type(dataset))

dataset ->  {1, 2, 3, 4}     <class 'set'>


<U><H1>Lists</H1></U>

Lists are used to store multiple items in a single variable.<BR>
List items are :-
- ordered, 
- changeable, 
- allow duplicate values.

List items are indexed, the first item has index [0], the second item has index [1] etc.<BR>
Lists are written with square brackets.

In [4]:
datalist = [1,2,3,4]
print("datalist -> ",datalist,"   ",type(datalist))

datalist ->  [1, 2, 3, 4]     <class 'list'>


<U><H1>TUPLES</H1></U>

Tuples are used to store multiple items in a single variable.<BR>
Tuple items are :-
1. ordered, 
2. unchangeable, 
3. allow duplicate values.

Tuples are written with round brackets.

In [5]:
datatuple = (1,"a",3,4)
print("datatuple -> ",datatuple,"   ",type(datatuple))

datatuple ->  (1, 'a', 3, 4)     <class 'tuple'>


<U><H1>DICTIONARIES</H1></U>

Dictionaries are used to store data values in key:value pairs.<BR>
Tuple items are :-
1. ordered, 
2. changeable, 
3. do not allow duplicate values.

Dictionaries are written with curly brackets, and have keys and values.

In [6]:
datadictionary = {1:"One", 2:"Two", 3:"Three", 4:"Four"}
print("datadictionary -> ",datadictionary,"   ",type(datadictionary))

datadictionary ->  {1: 'One', 2: 'Two', 3: 'Three', 4: 'Four'}     <class 'dict'>


# DATASET CREATION and MANIPULATION

First we import  numpy and pandas libraries

In [7]:
import pandas as pd
import numpy as np

viewing the pre-defined dictionary with indexing 

In [11]:
df = pd.Series(datadictionary, index = [1,2,3,4,5,6])
df

1      One
2      Two
3    Three
4     Four
5      NaN
6      NaN
dtype: object

# Student Data

#### Sno. | RollNo. | Name | Branch | Sem

A synthetic dataset is created for performing the data manipulation tasks.<br>
The dataset is a collection od student data with features :-
<ul>
    <li>S No.</li>
    <li>Roll No.</li>
    <li>Name</li>
    <li>Branch</li>
    <li>Sem</li>
<ul>

In [12]:
## Function to take data entries from the user
 
def datacreate(dictionary):
    
    s_no = input("Enter S_No. : ")
    roll_no = input("Enter Roll_No. : ")
    name = input("Enter Name : ")
    branch = input("Enter Branch : ")
    sem = input("Enter Sem : ")
    
    dictionary[len(dictionary)] ={'S No.' : s_no,
                                  'R No.' : roll_no,
                                  'Name' : name,
                                  'Branch' : branch,
                                  'Sem' : sem}
    
    ch = input("Enter More? (Y/N) : ")
    if ch == 'Y' :
        dictionary = datacreate(dictionary)
    return dictionary
    

In [13]:
## Dictionary creation and conversion to pandas dataframe

dictionary={}
dictionary=datacreate(dictionary)
df = pd.DataFrame(dictionary).T

In [16]:
## Exporting the aquired dataframe into an excel file

file_name = 'StudentData.xlsx'
df.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')

DataFrame is written to Excel File successfully.


# Working on Saved Data

## Import 
Dataset is imported from the source using pandas.read_excel() function (for excel files).

In [80]:
## importing the dataset
df=pd.read_excel(file_name)
df

Unnamed: 0.1,Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
1,1,2,1000015575,Tanya Rajpoot,B.A English (Hons.),V
2,2,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
3,3,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,4,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,5,6,1000014484,Srijan Kunwar,M.Tech IT,II
6,6,7,1000015489,Shreya Umrao,B.Pharm,IV


## Append
Adding a new entry to the dataframe

In [81]:
df1 = pd.DataFrame({'S No.' : [8,9],
                    'R No.' : [8,9],
                    'Name' : ['Test1','Test2'],
                    'Branch' : ['Branch1','Branch2'],
                    'Sem' : ['Sem1','Sem2']}, index=[8,9])
df=df.append(df1)
df

  df=df.append(df1)


Unnamed: 0.1,Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,0.0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
1,1.0,2,1000015575,Tanya Rajpoot,B.A English (Hons.),V
2,2.0,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
3,3.0,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,4.0,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,5.0,6,1000014484,Srijan Kunwar,M.Tech IT,II
6,6.0,7,1000015489,Shreya Umrao,B.Pharm,IV
8,,8,8,Test1,Branch1,Sem1
9,,9,9,Test2,Branch2,Sem2


## Fill NAN
Filling NA values with 0s 

In [82]:
df=df.fillna(0)
df

Unnamed: 0.1,Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,0.0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
1,1.0,2,1000015575,Tanya Rajpoot,B.A English (Hons.),V
2,2.0,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
3,3.0,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,4.0,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,5.0,6,1000014484,Srijan Kunwar,M.Tech IT,II
6,6.0,7,1000015489,Shreya Umrao,B.Pharm,IV
8,0.0,8,8,Test1,Branch1,Sem1
9,0.0,9,9,Test2,Branch2,Sem2


## Drop by row
Dropping unwanted entries from the data

In [83]:
df=df.drop(df.index[[8,7]], inplace=False).reset_index()
df

Unnamed: 0.1,index,Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,0,0.0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
1,1,1.0,2,1000015575,Tanya Rajpoot,B.A English (Hons.),V
2,2,2.0,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
3,3,3.0,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,4,4.0,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,5,5.0,6,1000014484,Srijan Kunwar,M.Tech IT,II
6,6,6.0,7,1000015489,Shreya Umrao,B.Pharm,IV


## Drop by column
Dropping unwanted features from the data

In [84]:
df=df.drop(["Unnamed: 0","index"], axis=1)
df

Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
1,2,1000015575,Tanya Rajpoot,B.A English (Hons.),V
2,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
3,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,6,1000014484,Srijan Kunwar,M.Tech IT,II
6,7,1000015489,Shreya Umrao,B.Pharm,IV


## Projection using "loc"
The loc attribute is used to project only selected entries(rows) from the dataframe.<br>
Purely label-location based indexer for selection by label.

In [98]:
df.loc[[0,2,4,5]]

Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
2,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI
4,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,6,1000014484,Srijan Kunwar,M.Tech IT,II


## Projection using "iloc"
The loc attribute is used to project only selected entries(rows) from the dataframe.<br>
Purely integer-location based indexing for selection by position.

In [99]:
df.iloc[3:6]

Unnamed: 0,S No.,R No.,Name,Branch,Sem
3,4,1000015577,Priyanka Garg,B.Sc Physics,III
4,5,1000015049,Raj Krishna,Ph.D Psychology,I
5,6,1000014484,Srijan Kunwar,M.Tech IT,II


## Searching 
Searching for an entry in dataframe.

In [105]:
df[(df['Sem'] == 'VI')]

Unnamed: 0,S No.,R No.,Name,Branch,Sem
0,1,1000015445,Manimit Haldar,B.Tech CSE,VI
2,3,1000014215,Rishi Dwivedi,B.Tech CSE,VI


Search with dropped NaN values.

In [109]:
df[['Name','Sem']].where(df['Branch']=='B.Tech CSE').dropna()

Unnamed: 0,Name,Sem
0,Manimit Haldar,VI
2,Rishi Dwivedi,VI
