## Standard container sequences 

### Mutable

#### Lists
* Hold data in order it was added
* Mutable
* Index

#### Set
* Unique
* Unordered
* Mutable
* Python's implementation of Set Theory from Mathematics

#### dictionaries
* Hold data in key/value pairs
* Nestable (use a dictionary as the value of a key within a dictionary)
* Iterable
* Created by dict() or {}

> If you ask for a key that does not exist that will stop your program from running in a `KeyError`

### Immutable

#### Tuple
* Hold data in order
* Index
* Immutable
* Pairing
* Unpackable

In [18]:
# list
mylist = list('abcde')
herlist = list('fghij')
display(mylist, herlist)

['a', 'b', 'c', 'd', 'e']

['f', 'g', 'h', 'i', 'j']

In [19]:
#adding an item to a list
mylist.append('hello')
display(mylist)
mylist.append(herlist)
display(mylist)

['a', 'b', 'c', 'd', 'e', 'hello']

['a', 'b', 'c', 'd', 'e', 'hello', ['f', 'g', 'h', 'i', 'j']]

In [22]:
#reset
mylist = list('abcde')
herlist = list('fglmj')
#combine lists
display(mylist + herlist)
mylist.extend(herlist) # occurs inplace!
display(mylist)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [23]:
#access elements in a list
display(mylist[1])
display(mylist.index('d'))

'b'

3

In [24]:
# removes an item from a list
mylist.pop() #removes last item by defect and displays it

'j'

In [26]:
mylist.pop(1) #can take position al argument

'b'

In [27]:
#sort list
sorted(herlist) #numeriacal or alphabetical order

['f', 'g', 'h', 'i', 'j']

In [35]:
#reset
mylist = list('abcde')
herlist = list('fghij')
#create list of tuples by zipping lists together
z = list(zip(mylist, herlist))
display(z)

[('a', 'f'), ('b', 'g'), ('c', 'h'), ('d', 'i'), ('e', 'j')]

In [36]:
#unpack tuple
a,b = z[0]
display(a) #this can be done directly within loops

'a'

In [37]:
# to create tuples with the position and the data in that position while looping
for ind, it in enumerate(mylist):
    print(ind, it)

0 a
1 b
2 c
3 d
4 e


In [54]:
# to modify sets
s = set(mylist)
display(s, type(s))
s.add('h')
display(s)
s.update(herlist) # occurs inplace!
print(s)

{'a', 'b', 'c', 'd', 'e'}

set

{'a', 'b', 'c', 'd', 'e', 'h'}

{'f', 'b', 'd', 'c', 'e', 'i', 'a', 'g', 'h', 'j'}


to remove data from sets
`discard()` safely removes an element from the set by value
`pop()` removes and returns an arbitrary element from the set (KeyError when empty)

In [55]:
s.discard('f')
display (s)

{'a', 'b', 'c', 'd', 'e', 'g', 'h', 'i', 'j'}

In [56]:
s.pop()

'b'

> Set Operations
* `.union()` set method returns a set of all the names ( or )
* `.intersection()` method identi?es overlapping data ( and )
* `.difference()` method identi?es data present in the set on which the method was used that is not in the arguments ( - )

In [58]:
#create dict
mydict = {}
mylist = list('abcde')
herlist = list('fghij')
for a,b in list(zip(mylist, herlist)):
    mydict[a]=b
    
print(mydict)

{'a': 'f', 'b': 'g', 'c': 'h', 'd': 'i', 'e': 'j'}


In [59]:
#to find values
display(mydict['b']) # If you ask for a key that does not exist that will stop your program from running in a KeyError
display(mydict.get('b')) # allows you to safely access a key without error or exception handling

'g'

'g'

* `.update()` method to update a dictionary from another dictionary, tuples or keywords
* `del` instruction deletes a key/value
* `.pop()` method safely removes a key/value from a dictionary

In [60]:
del mydict['b']
mydict

{'a': 'f', 'c': 'h', 'd': 'i', 'e': 'j'}

In [62]:
mydict.pop('a')

'f'

In [63]:
mydict

{'c': 'h', 'd': 'i', 'e': 'j'}

* `.items()` method returns an object we can iterate over
* `in` allows checking in value in dict

##### to read from a file usin csv reader
* `csv.reader()` reads a ?le object and returns the lines from the file as tuples

```
import csv
csvfile = open('csvfile.csv', 'r')
for row in csv.reader(csvfile):
    print(row)
```

##### to create a dict from a file use DictReader

```
for row in csv.DictReader(csvfile):
    print(row)
```

In [18]:
# Import the csv module
import csv

# Create the file object: csvfile
csvfile = open ('../datasets/clean/production.csv', 'r')

d = csv.DictReader(csvfile)

for row in d:
    print(row)

{'': '2', 'Area': 'Afghanistan', 'Item': 'Almonds, with shell', '1961': '', '1962': '', '1963': '', '1964': '', '1965': '', '1966': '', '1967': '', '1968': '', '1969': '', '1970': '', '1971': '', '1972': '', '1973': '', '1974': '', '1975': '0.0', '1976': '9800.0', '1977': '9000.0', '1978': '12000.0', '1979': '10500.0', '1980': '9900.0', '1981': '8000.0', '1982': '11000.0', '1983': '9700.0', '1984': '10500.0', '1985': '9000.0', '1986': '10000.0', '1987': '9000.0', '1988': '9000.0', '1989': '8800.0', '1990': '9500.0', '1991': '9000.0', '1992': '9900.0', '1993': '9000.0', '1994': '9000.0', '1995': '9000.0', '1996': '9000.0', '1997': '9000.0', '1998': '9000.0', '1999': '11000.0', '2000': '12000.0', '2001': '15000.0', '2002': '11774.0', '2003': '14000.0', '2004': '14700.0', '2005': '15630.0', '2006': '20000.0', '2007': '31481.0', '2008': '42000.0', '2009': '43183.0', '2010': '56000.0', '2011': '60611.0', '2012': '62000.0', '2013': '42215.0', '2014': '27400.0', '2015': '24246.0', '2016': '32

{'': '2426', 'Area': 'Bangladesh', 'Item': 'Oilcrops, Oil Equivalent', '1961': '64107.0', '1962': '66688.0', '1963': '66860.0', '1964': '60632.0', '1965': '65408.0', '1966': '72484.0', '1967': '82274.0', '1968': '90036.0', '1969': '93144.0', '1970': '85158.0', '1971': '86897.0', '1972': '75357.0', '1973': '73399.0', '1974': '72693.0', '1975': '82660.0', '1976': '79020.0', '1977': '83843.0', '1978': '101860.0', '1979': '134637.0', '1980': '143136.0', '1981': '138236.0', '1982': '146107.0', '1983': '149571.0', '1984': '153373.0', '1985': '166697.0', '1986': '155701.0', '1987': '146004.0', '1988': '148483.0', '1989': '145044.0', '1990': '148388.0', '1991': '149909.0', '1992': '159200.0', '1993': '166858.0', '1994': '156834.0', '1995': '160381.0', '1996': '158508.0', '1997': '161370.0', '1998': '162770.0', '1999': '150351.0', '2000': '130662.0', '2001': '125564.0', '2002': '126590.0', '2003': '121134.0', '2004': '129594.0', '2005': '166110.0', '2006': '156025.0', '2007': '161145.0', '2008'

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)




{'': '39715', 'Area': 'Eastern Africa', 'Item': 'Mangoes, mangosteens, guavas', '1961': '220914.0', '1962': '233014.0', '1963': '236014.0', '1964': '249015.0', '1965': '252115.0', '1966': '266215.0', '1967': '271316.0', '1968': '273416.0', '1969': '295517.0', '1970': '312717.0', '1971': '328818.0', '1972': '334918.0', '1973': '356684.0', '1974': '432169.0', '1975': '384855.0', '1976': '458760.0', '1977': '406555.0', '1978': '400710.0', '1979': '377396.0', '1980': '401131.0', '1981': '411797.0', '1982': '426479.0', '1983': '437001.0', '1984': '452867.0', '1985': '463883.0', '1986': '478883.0', '1987': '495767.0', '1988': '506057.0', '1989': '520350.0', '1990': '530903.0', '1991': '545686.0', '1992': '537747.0', '1993': '553923.0', '1994': '541039.0', '1995': '550298.0', '1996': '552009.0', '1997': '528727.0', '1998': '619023.0', '1999': '639714.0', '2000': '618484.0', '2001': '741169.0', '2002': '802006.0', '2003': '805973.0', '2004': '813726.0', '2005': '991533.0', '2006': '1030785.0'

{'': '42102', 'Area': 'Central America', 'Item': 'Chick peas', '1961': '135056.0', '1962': '129108.0', '1963': '97166.0', '1964': '123799.0', '1965': '135361.0', '1966': '151760.0', '1967': '165342.0', '1968': '179277.0', '1969': '183240.0', '1970': '185575.0', '1971': '166945.0', '1972': '228054.0', '1973': '226027.0', '1974': '249211.0', '1975': '195072.0', '1976': '73488.0', '1977': '271593.0', '1978': '215474.0', '1979': '355537.0', '1980': '153248.0', '1981': '148020.0', '1982': '164671.0', '1983': '153789.0', '1984': '172613.0', '1985': '139472.0', '1986': '146367.0', '1987': '243393.0', '1988': '88400.0', '1989': '155778.0', '1990': '180147.0', '1991': '194581.0', '1992': '112987.0', '1993': '182893.0', '1994': '139678.0', '1995': '167244.0', '1996': '278749.0', '1997': '243712.0', '1998': '98469.0', '1999': '197626.0', '2000': '233809.0', '2001': '326119.0', '2002': '235053.0', '2003': '142800.0', '2004': '104527.0', '2005': '133976.0', '2006': '163348.0', '2007': '148495.0', '

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



## Advanced container sequences: Collections Module

#### Counter
Special dictionary used for counting data, measuring frequency

* `.most_common()` method returns the counter values in descending order

#### defaultdict
Pass it a default type that every key will have even if it doesn't currently exist (apart from that it works exactly like a dictionary)

#### OrderedDict
Maintain Dictionary Order

* `.popitem()` method returns items in reverse insertion order
> you can use the  `last=False` keyword argument to return the items in insertion order

#### namedtuple
A tuple where each position (column) has a name
Ensure each one has the same properties
Alternative to a  pandas DataFrame row
> Each field is available as an attribute of the namedtuple (with dot notation `namedtuple.positionname`

In [6]:
# Import the csv module
import csv

# Create the file object: csvfile
csvfile = open ('../datasets/clean/production.csv', 'r')

# Create an empty list: 
prod_list = []

# Loop over a csv reader on the file object
for row in csv.reader(csvfile):

    # Append each row of data
    prod_list.append(row)
    
# Print the first 2 records
print(prod_list[1:3]) #skip header

[['2', 'Afghanistan', 'Almonds, with shell', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '0.0', '9800.0', '9000.0', '12000.0', '10500.0', '9900.0', '8000.0', '11000.0', '9700.0', '10500.0', '9000.0', '10000.0', '9000.0', '9000.0', '8800.0', '9500.0', '9000.0', '9900.0', '9000.0', '9000.0', '9000.0', '9000.0', '9000.0', '9000.0', '11000.0', '12000.0', '15000.0', '11774.0', '14000.0', '14700.0', '15630.0', '20000.0', '31481.0', '42000.0', '43183.0', '56000.0', '60611.0', '62000.0', '42215.0', '27400.0', '24246.0', '32843.0', '27291.0', '34413.0', '38205.0', '18990.933333333334', '15382.579515921128'], ['5', 'Afghanistan', 'Anise, badian, fennel, coriander', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '500.0', '500.0', '200.0', '800.0', '1000.0', '1331.0', '500.0', '1556.0', '1941.0', '2000.0', '4200.0', '2500.0', '7100.0', '7000.0', '2500.0', '1000.0', '2000.0', '4100.0', '9000.0', '10000.0', '10846.0', '17000.0', '9000.

In [19]:
# Import necessary modules
from collections import Counter

# Create a Counter Object: 
lines_per_country = Counter()

# Loop over the  list
for row in prod_list:
    
    # Increment the counter for the country of the row by one
    lines_per_country[row[1]] += 1
    
# Print the 3 most common months for crime
print(lines_per_country.most_common(15))

[('World', 175), ('Net Food Importing Developing Countries', 166), ('Asia', 165), ('Low Income Food Deficit Countries', 165), ('Americas', 164), ('Africa', 159), ('Land Locked Developing Countries', 156), ('South America', 151), ('Least Developed Countries', 149), ('Eastern Asia', 143), ('Europe', 141), ('European Union (28)', 141), ('European Union (27)', 141), ('China', 140), ('Southern Europe', 138)]


In [25]:
# Imports
import csv
from collections import defaultdict

# Create the file object: 
csvfile = open ('../datasets/clean/eufoodloss.csv', 'r')

# Create an empty list: 
foodloss = []

# Loop over a csv reader on the file object
for row in csv.reader(csvfile):

    # Append each row of data
    foodloss.append(row)

print(foodloss[:5])

#instantiate
f_types = defaultdict(int) 

for row in foodloss:
    if row[6] == 'Harvest':
        f_types['Harvest'] += 1
    if row[6] == 'Storage':
        f_types['Storage'] += 1 
print('\n', f_types)

[['', 'country', 'crop', 'timepointyears', 'percentage_loss_of_quantity', 'activity', 'fsc_location1'], ['4', 'Denmark', 'Wheat', '2017', '', '', 'Farm'], ['5', 'Denmark', 'Peas, green', '2017', '', '', 'Farm'], ['6', 'Denmark', 'Peas, green', '2017', '', '', 'Pre-Harvest'], ['7', 'Denmark', 'Peas, green', '2017', '', '', 'Harvest']]

 defaultdict(<class 'int'>, {'Harvest': 13, 'Storage': 4})
