# Data Types for Data Science in Python

## Introduction and lists
### Data types
- Data type system sets the stage for the capabilities of the language.
- Understanding data types empowers you as a data scientist.

### Container sequence
- Hold other types of data
- Used for aggregation, sorting, and more
- Can be mutable (list, set) or immutable (tuple)
- Iterable

## >> Lists
- Hold data in order it was added
- Mutable
- Index

### Accessing single items in list

In [1]:
cookies = ['chocolate chip','peanut butter','sugar']

In [2]:
cookies.append('Tirggel')

In [3]:
print(cookies)

['chocolate chip', 'peanut butter', 'sugar', 'Tirggel']


In [4]:
print(cookies[2])

sugar


### Combining Lists
- Using operators, you can combine two lists into a new one

In [5]:
cakes = ['strawberry','vanilla']

In [6]:
desserts = cookies + cakes
print(desserts)

['chocolate chip', 'peanut butter', 'sugar', 'Tirggel', 'strawberry', 'vanilla']


- `.extend()` method merges a list into another list at the end

### Finding Elements in a List
- `.index()` method locates the position of a data element in a list

In [7]:
position = cookies.index('sugar')
print(position)

2


In [8]:
cookies[3]

'Tirggel'

### Removing Elements in a List
- Removing Elements in a List `.pop()` method removes an item from a list and allows you to save it

In [9]:
name = cookies.pop(position)
print(name)

sugar


In [10]:
print(cookies)

['chocolate chip', 'peanut butter', 'Tirggel']


### Iterating over lists
- `for` loops are the most common way of iterating over a list

In [11]:
for cookie in cookies:
    print(cookie)

chocolate chip
peanut butter
Tirggel


### Sorting lists
- `sorted()` function sorts data in numerical or alphabeticalorder and returns a new list

In [12]:
print(cookies)

['chocolate chip', 'peanut butter', 'Tirggel']


In [13]:
sorted_cookies = sorted(cookies)
print(sorted_cookies)

['Tirggel', 'chocolate chip', 'peanut butter']


## >> Tuples
- Hold data in order
- Index
- Immutable
- Pairing
- Unpackable

### Zipping tuples
- Tuples are commonly created by zipping lists together with `zip()`
Two lists: `us_cookies` , `in_cookies`

In [15]:
us_cookies = ['Chocolate Chip', 'Brownies', 'Peanut Butter', 'Oreos', 'Oatmeal Raisin']
in_cookies = ['Punjabi', 'Fruit Cake Rusk', 'Marble Cookies', 'Kaju Pista Cookies', 'Almond Cookies']

In [16]:
top_pairs = list(zip(us_cookies, in_cookies))
print(top_pairs)

[('Chocolate Chip', 'Punjabi'), ('Brownies', 'Fruit Cake Rusk'), ('Peanut Butter', 'Marble Cookies'), ('Oreos', 'Kaju Pista Cookies'), ('Oatmeal Raisin', 'Almond Cookies')]


### Unpacking tuples
- Unpacking tuples is a very expressive way for working with data

In [20]:
us_num_1, in_num_1 = top_pairs[0]
print(us_num_1, in_num_1, sep= " >>> ")

Chocolate Chip >>> Punjabi


### More unpacking in Loops
- Unpacking is especially powerful in loops

In [21]:
for us_cookie, in_cookie in top_pairs:
    print(in_cookie)
    print(us_cookie)

Punjabi
Chocolate Chip
Fruit Cake Rusk
Brownies
Marble Cookies
Peanut Butter
Kaju Pista Cookies
Oreos
Almond Cookies
Oatmeal Raisin


### Enumerating positions
- Another useful tuple creation method is the `enumerate()` function
- Enumeration is used in loops to return the position and the data in that position while looping

In [23]:
for idx, item in enumerate(top_pairs):
    us_cookie, in_cookie = item
    print(idx, us_cookie, in_cookie)

0 Chocolate Chip Punjabi
1 Brownies Fruit Cake Rusk
2 Peanut Butter Marble Cookies
3 Oreos Kaju Pista Cookies
4 Oatmeal Raisin Almond Cookies


### Be careful when making tuples
- Use `zip()` , `enumerate()` , or `()` to make tuples

In [24]:
item = ('vanilla','chocolate')
print(item)

('vanilla', 'chocolate')


Beware of tailing commas!

In [25]:
item2 = 'butter',
print(item2)

('butter',)


# >> Sets for unordered and unique data

#### Set
- Unique
- Unordered
- Mutable
- Python's implementation of Set Theory from Mathematics

### Creating Sets
- Sets are created from a list

In [26]:
cookies_eaten_today = ['chocolate chip','peanut butter','chocolate chip','oatmeal cream','chocolate chip']
types_of_cookies_eaten = set(cookies_eaten_today)
print(types_of_cookies_eaten)

{'peanut butter', 'chocolate chip', 'oatmeal cream'}


### Modifying Sets
- `.add()` adds single elements
- `.update()` merges in another set or list

In [28]:
types_of_cookies_eaten.add('biscotti')
types_of_cookies_eaten.add('chocolate chip')
print(types_of_cookies_eaten)

{'peanut butter', 'chocolate chip', 'biscotti', 'oatmeal cream'}


### Updating Sets

In [29]:
cookies_hugo_ate = ['chocolate chip','anzac']
types_of_cookies_eaten.update(cookies_hugo_ate)
print(types_of_cookies_eaten)

{'oatmeal cream', 'peanut butter', 'chocolate chip', 'anzac', 'biscotti'}


### Removing data from sets
- `.discard()` safely removes an element from the set by value
- `.pop()` removes and returns an arbitrary element from the set (KeyError when empty)


In [30]:
types_of_cookies_eaten.discard('biscotti')
print(types_of_cookies_eaten)

{'oatmeal cream', 'peanut butter', 'chocolate chip', 'anzac'}


In [31]:
types_of_cookies_eaten.pop()
types_of_cookies_eaten.pop()

'peanut butter'

### Set Operations - Similarities
- `.union()` set method returns a set of all the names ( `or` )
- `.intersection()` method identifies overlapping data ( `and` )

In [32]:
cookies_jason_ate = set(['chocolate chip','oatmeal cream','peanut butter'])
cookies_hugo_ate = set(['chocolate chip','anzac'])
cookies_jason_ate.union(cookies_hugo_ate)

{'anzac', 'chocolate chip', 'oatmeal cream', 'peanut butter'}

In [33]:
cookies_jason_ate.intersection(cookies_hugo_ate)

{'chocolate chip'}

### Set Operations - Differences
- `.difference()` method identifies data present  in the set on which the method was used that is not in the arguments ( `-` )
- Target is important!

In [34]:
cookies_jason_ate.difference(cookies_hugo_ate)

{'oatmeal cream', 'peanut butter'}

In [35]:
cookies_hugo_ate.difference(cookies_jason_ate)

{'anzac'}

# >> Using dictionaries

### Creating and looping through dictionaries
- Hold data in key/value pairs
- Nestable (use a dictionary as the value of a key within adictionary)
- Iterable
- Created by dict() or {}

In [36]:
art_galleries = {}
for name, zip_code in galleries:
    art_galleries[name] = zip_code

NameError: name 'galleries' is not defined