# Data Storage

<br>
<div style="text-align: justify">Besides int, float, and str data types we have discussed earlier,
Python provides collection data types to store a mix of multiple
entries from alphabets, numbers, strings, alphanumeric, and
special characters. The four collection data types of Python
are given below.</div>
<br>

- List: It is an ordered collection that is changeable. It allows duplicate entries.
- Tuple: It is an ordered collection that is unchangeable. It also allows duplicate entries.
- Set: It is an unordered and unindexed collection. It does not allow duplicate entries, just like real sets.
- Dictionary: It is an unordered, changeable, and indexed collection of entries. It does not allow duplicate entries.

### Key Concepts
1. List
2. Tuples
3. Sets
4. Dictionaries
5. DataFrame (Continued on another dedicated file)

## Lists

<br>
<div style="text-align: justify">A list is an ordered and changeable collection of elements. In
Python, lists are written with square brackets. For example, to
create a list named fruitlist, type the following code:</div>

In [1]:
fruitlist = ["Apple", "Orange", "Banana", "Rambutan"]
print(fruitlist)


['Apple', 'Orange', 'Banana', 'Rambutan']


<div style="text-align: justify">We can access the items/elements of a list by referring to the
index number. For example, to print the second item, “orange,”
of the list, we type the following code:</div>

In [2]:
print(fruitlist[1])

Orange


<div style="text-align: justify">As discussed earlier, Python allows negative indexing. Index
number −1 refers to the last item of the list, −2 pertains to the
second last item, and so on. For example, to print the second
last item, “banana” of the list, type the following code:</div>

In [3]:
fruitlist = ["apple", "orange", "banana", "melon"]
print(fruitlist[-2])


banana


In [4]:
print(fruitlist[2:4])    # Elements at index 2 and 3 but not 4 are accessed. 

['banana', 'melon']


<div style="text-align: justify">Try the following, and observe the output.</div>

In [5]:
print(fruitlist[:3])     # returns list elements from the start to "banana"
print(fruitlist[2:])     # returns elements from "banana" to last element
print(fruitlist[-3:-1])  # returns elements from index -3 to -1


['apple', 'orange', 'banana']
['banana', 'melon']
['orange', 'banana']


<div style="text-align: justify">Try the following, and observe the output.Since lists are mutable, we can change the value of a specific
element by using its index. For example, to change the second
element of the fruitlist, type the following:</div>

In [6]:
fruitlist[1] = "dates"
print(fruitlist)


['apple', 'dates', 'banana', 'melon']


<div style="text-align: justify">We can check if an element is present in the list by using the
    keyword <i>in</i> as follows:</div>

In [7]:
if "apple" in fruitlist:
    print("apple is present in the list")


apple is present in the list


<br>
<img src="Images/datatype1.png" style="margin:auto"/>  
<img src="Images/datatype2.png" style="margin:auto"/>   
<img src="Images/datatype3.png" style="margin:auto"/>   

In [8]:
fruitlist =  ["apple", "orange", "banana", "melon"]

#### append() moves an element in a list to the end point of the list

In [9]:
fruitlist.append('watermelon') 
fruitlist

['apple', 'orange', 'banana', 'melon', 'watermelon']

#### .insert(order, string) command can be used to insert a string into the list at specific order

In [10]:
fruitlist.insert(1, 'cherry')  

#### .remove() command is used to remove an element in a list

In [11]:
fruitlist.remove("banana")
print(fruitlist)


['apple', 'cherry', 'orange', 'melon', 'watermelon']


#### .clear() command is used to empty the whole list or data structure

In [12]:
fruitlist.clear()
print(fruitlist)


[]


#### del command can be used to entirely erase the list 

In [13]:
#To completely remove a list, use keyword del as below.

del fruitlist # removes fruitlist
fruitlist

NameError: name 'fruitlist' is not defined

#### Use logical operator sum to join two lists

In [14]:
#Lists can be joined together using + operator as follows.

fruitlist = ["apple" , "orange"]
quantity = [1, 4]
fruit_quantity = fruitlist + quantity
print(fruit_quantity)

['apple', 'orange', 1, 4]


## Tuples

<br>
<div style="text-align: justify">A tuple is an ordered collection that is unchangeable. Tuples
are written using round brackets () in Python. For example, to
create a tuple, type:</div>

In [15]:
mytuple = ("Python", "for", "Data Science")
print(mytuple)

('Python', 'for', 'Data Science')


<div style="text-align: justify">Similar to a list, we can access the elements of a tuple using
[ ]. For example:</div>

In [16]:
print(mytuple[2])

Data Science


<div style="text-align: justify">Negative indexing and a range of indexing can be used on
tuples, the way we use them for the lists. We cannot change
values present in a tuple once it is created because tuples are
immutable. However, there is a workaround:</div>

1. Convert the tuple into a list,
2. change the list, and
3. convert the list back into a tuple.

In [17]:
mytuple = ("Python", "for", "Data Science")
mylist = list(mytuple) ## list() command here is used to copy "mytuple" list
mylist[1] = "is handy for" ## Replace list element mylist[1] 
mytuple = tuple(mylist)
print(mytuple)

('Python', 'is handy for', 'Data Science')


<div style="text-align: justify">Similar to lists, we can:</div>

- loop through elements of a tuple by using a for loop;
- determine if a specified element is present in a tuple by using the keyword in;
- determine the number of elements of a tuple using the len() method; and
- join two or more tuples by using the + operator.

<br>
<div style="text-align: justify">Once a tuple is created, we cannot add items to it. However,
    we can delete the tuple completely using <i>del mytuple</i>.</div>

## Tuple Methods

<br>
<div style="text-align: justify"><b>count()</b> returns the number of times a specified value occurs
in a tuple.</div>

In [18]:
mytuple = ("Python", "for", "Data Science")
mytuple.count('Python')

1

<div style="text-align: justify"><b>index()</b> searches the tuple for a specified value and returns
the position where it is found. For example,</div>

In [19]:
print(mytuple.index('Data Science'))
print(mytuple.index('Data'))

2


ValueError: tuple.index(x): x not in tuple

<div style="text-align: justify">We get a ValueError if the specified value is not present in the
tuple.</div>

## Sets

<br>
<div style="text-align: justify">A set is an unordered and unindexed collection of elements.
Sets are written with curly brackets { } in Python. For example,</div>

In [20]:
myset = {"cat","tiger","dog","cow"}
print(myset)
## Note how the output strings in a set are not ordered

{'dog', 'tiger', 'cat', 'cow'}


<div style="text-align: justify">Since sets are unordered, there is no index associated with its
elements. However, we can loop through the elements of a set
    using a <i>for loop</i>.</div>

In [21]:
myset = {"cat", "tiger", "dog", "cow"}
for x in myset:
    print(x)

dog
tiger
cat
cow


<div style="text-align: justify">Note there is no order in the printed output. We can also check
if a specified value is present in a set by using the keyword in.
For example,</div>

In [22]:
print("tiger" in myset)
print("lion" in myset)

True
False


<div style="text-align: justify">We cannot change elements of a set once it is created.
    However, we can add new items. We use the method <b>add()</b> to
add one element to a set.</div>

In [23]:
myset.add("lion")
print(myset)

{'tiger', 'lion', 'cat', 'cow', 'dog'}


<div style="text-align: justify">Note there is no order in the output. To add multiple elements,
    we use method <b>update()</b>.</div>

In [24]:
myset = {"cat", "tiger", "dog", "cow"}
myset.update(["lion","geese","hawk","worm","dog","sheep","sheep"])
## Note that the line above can also be modified to import a list into a set as well

print(myset)

{'tiger', 'hawk', 'worm', 'lion', 'cat', 'cow', 'sheep', 'dog', 'geese'}


<div style="text-align: justify">Note that ‘sheep’ appears once in the output because
duplicates are not allowed in sets. The method ,<b>len(myset)</b>
gives the number of elements of a set.</div>
    
<br>
<div style="text-align: justify">We can use the <b>remove()</b> or the <b>discard()</b> method to remove
an element in a set. For example, we remove “cow” by using
the remove() method.</div>

In [25]:
myset = {"cat", "tiger", "dog", "cow"}
print(len(myset))
myset.remove("cat")
myset.discard("tiger")
print(myset)
print(len(myset))

4
{'dog', 'cow'}
2


<div style="text-align: justify">We can use the <b>union()</b> method to join two or more sets in
    Python, or we can use the <b>update()</b> method that inserts all
elements from one set into another. Type the following code:</div>

In [26]:
myset1 = {"A", "B" , "C"}
myset2 = {1, 2, 3}
myset3 = myset1.union(myset2)
print(myset3)


myset4 = {"Alpha","beta","cappa"}
myset6 = myset1.union(myset2,myset4)
print(myset6)

#listset1 = list(myset1)
#listset1.append("B")
#listset1.append("C")
#listset2 = list(myset2)
#listset2.append("2")
#listset2.append("3")
#listset3 = list(myset3)
#listset3.append("beta")
#listset1.append("cappa")

#superlist = listset1 + listset2 + listset3
#print(superlist)

{'C', 1, 2, 3, 'B', 'A'}
{'C', 1, 2, 3, 'Alpha', 'A', 'beta', 'B', 'cappa'}


<div style="text-align: justify">Furthermore, we can:</div>

- remove the last item of a set by using the pop() method;
- empty the set by using the clear() method; and
- delete the set completely using the keyword del.

## Dictionaries for Data Indexing

<br>
<div style="text-align: justify">A dictionary is an unordered, changeable, and indexed
collection of items. A Python dictionary has a key:value pair
for every element. Dictionaries are optimized to retrieve values
when the key is known. To create a dictionary in Python, we
separate key:value element pairs by commas and place them inside curly braces { }. For instance, the following piece of
code creates a dictionary named mydict.</div>

In [27]:
mydict = {
    "name":"Python",
    "purpose":"Data Science",
    "year":2020
}

print(mydict)


{'name': 'Python', 'purpose': 'Data Science', 'year': 2020}


<div style="text-align: justify">The values can repeat and they can be of any data type.
However, keys must be a unique and immutable string, number,
    or tuple. We use <b>square brackets</b> to access a specified value
of a dictionary by referring to its key name as follows.</div>

In [28]:
# accesses value for key 'name'
print(mydict['name'])

# accesses value for key 'purpose'
print(mydict.get('purpose'))

Python
Data Science


<div style="text-align: justify">When we try to access a non-existent key, we get a message:
None.</div>

In [29]:
print(mydict.get('address'))

None


<div style="text-align: justify">If we run print(mydict[‘address’]), we get the following error:
KeyError: ‘address’</div>

<br>
<div style="text-align: justify">This error indicates that the key ‘address’ does not exist. We
can alter the value of a specific element by referring to its key
name as follows:</div>

In [30]:
mydict["year"] = 2019
mydict

{'name': 'Python', 'purpose': 'Data Science', 'year': 2019}

<div style="text-align: justify">We can loop through a dictionary by using a for loop that
    returns <i>keys</i> of the dictionary.</div>

In [31]:
for k in mydict:
    print(k)

name
purpose
year


<div style="text-align: justify">We can return the <i>values</i> as well.</div>

In [32]:
for k in mydict:
    print(mydict[k])

Python
Data Science
2019


<div style="text-align: justify">We can get the same output if we use the following.</div>

In [33]:
for k in mydict.values():
    print(k)

Python
Data Science
2019


<div style="text-align: justify">We can loop through a dictionary and access both keys and
values by using the method items() as follows.</div>

In [34]:
for x, y in mydict.items():
    print(x, y)

name Python
purpose Data Science
year 2019


<div style="text-align: justify">We can check whether a key is present in the dictionary by
    using a conditional <i>if</i> statement.</div>

In [35]:
if "purpose" in mydict:
    print("'purpose' is one of the valid keys")

'purpose' is one of the valid keys


<div style="text-align: justify">A new element can be added to the dictionary by using a new
key and assigning a value to this key, as given below.</div>

In [36]:
mydict["pages"] = 300
print(mydict)

{'name': 'Python', 'purpose': 'Data Science', 'year': 2019, 'pages': 300}


<div style="text-align: justify">The method <i>pop()</i> removes the element with the specified
    key name. The keyword <i>del</i> also does the same.</div>

In [37]:
mydict.pop("year")
# or use del mydict["year"] to get same result
print(mydict)

{'name': 'Python', 'purpose': 'Data Science', 'pages': 300}


<div style="text-align: justify">The keyword <i>del</i> removes the whole dictionary when we
use del mydict. The method clear() deletes all elements of a
dictionary.</div>

In [38]:
mydict.clear()
mydict

{}

# DataFrame
We can use DataFrame commands from Pandas package to visualise dictionary in Python in a way that it looks like a table

In [4]:
import pandas as pd
data = {
    'color': ['blue','green','yellow','red','black'],
    'items' : ['ball','pen','pencil','marker','mug'],
    'price' : [2.5,1.2,0.6,4.5,9]
}
myframe = pd.DataFrame(data)
myframe

Unnamed: 0,color,items,price
0,blue,ball,2.5
1,green,pen,1.2
2,yellow,pencil,0.6
3,red,marker,4.5
4,black,mug,9.0


In [5]:
mydata = {
    'Employee Name' : ['Muthu', 'Ali','Cheong'],
    'Skillset' : ['Python','Powerpoint','Tablue'],
    'Experience(years)' : [4,5,2]
}

myframe = pd.DataFrame(mydata)
myframe

Unnamed: 0,Employee Name,Skillset,Experience(years)
0,Muthu,Python,4
1,Ali,Powerpoint,5
2,Cheong,Tablue,2


In [6]:
myframe2 = pd.DataFrame(mydata, columns = ['Experience(years)','Employee Name'])
myframe2

Unnamed: 0,Experience(years),Employee Name
0,4,Muthu
1,5,Ali
2,2,Cheong


We can alter the indec of a Data Frame using the snippet below

In [8]:
myframe3 = pd.DataFrame(mydata, index = ['zero','one',2])
myframe3


Unnamed: 0,Employee Name,Skillset,Experience(years)
zero,Muthu,Python,4
one,Ali,Powerpoint,5
2,Cheong,Tablue,2


In [9]:
import numpy as np

np.arange(15).reshape((3,5)) 
# Generates number from 0 to 14, creates 3x5 matrices

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])