# B08 Badge: List Comprehentions: Filter and Map (represent) Lists.

Note: This is a **JUPYTER NOTEBOOK**. It's a type of website where you can edit and run computer programms (code). You interact with it in your web browser and you can find it via your Learn.

1. These blocks here are cells.
2. There are **TEXT CELLS** like this one with explanations of concepts.
3. And **CODE CELLS** with Python code (see below). Code cells have a ```In []``` written to the left.
4. You can **RUN CODE CELLS** by clicking on them and pressing **Shirt + Enter**. When you run a cell code in it is run (it "happens", computer will do what you asked for it to do). Results of what your code does will appear underneath the cell.
5. As we go through these lessons, please READ text cells, and RUN code cells.
6. Good luck!

In [None]:
# Slowly we will start importing all the useful libraries at the beginning of each badge. Run this now!

import pprint as pp

# Learning objectives:

At the end of this badge you will know:

- How to use List Comprehentions - simplified python 'loops' for filtering and mapping.
- How to read List Comps - first input, then condition, then output.
- How to use List Comps to pull out some useful information out of JSON data.

### 🔜 SPOILER ALERT:

You will also understand these lines of code:

In [None]:
# Example: Filter out the short words, and then simplify data to just lengths of words.

fruits = ["apple", "pear", "banana", "pinapple"]

lengths_of_long_words = [
    len(fruit)
    for fruit in fruits
    if len(fruit) > 5
]

print(lengths_of_long_words)

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 2}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

sugars_in_vegan_coffees = [ order['sugar']
                            for order in coffee_orders 
                            if order['milk'] != 'Cow'
                          ]

print("Vegans drank a total of",sum(sugars_in_vegan_coffees), "sugars")

 🎯 End of learning objectives 🎯

### List Comprehensions: A special Python syntax to represent, 'comprehend', collections.

LIST COMPREHENSION is a special, very useful syntax that makes it faster and cleaner to change or filter collections.
Very frequently you have a collection of some objects - strings, numbers, or dictionaries and you want to...

- **FILTER DATA**: Only keep some values in the collection. Just like you would filter coffee grains out from your coffee, using a paper filter that allows small particles through, and stops larger particles.

- **MAP DATA**: Reduce complex data to simpler data or a different data format. Just like geographical map represents terrain in a simplified flat paper format.

Basically, all the other programing languages envy that Python has list comprehensions. They are super cool. Lucky for us!

In [None]:
# Example:  Simplify data to just lengths of words.

fruits = ["apple", "pear", "banana", "pinapple"]

lengths_of_words = [
                    len(fruit)
                    for fruit in fruits
                   ]

print(lengths_of_words)

Think about it like a conveyor belt - things go in on one side, and slightly changed things come out on the other side.

**By the way, you sort of read list comprehensions from the end, or at least from the middle-end**

One way to read list comprehention is: Input -> Output

`
[ 
OUTPUT (some version of the 'item')
INPUT  (which is basically FOR 'item' IN 'items')
]
`


**INPUT**: A definition of where the data will come from.

- **ITEMS**: A collection from which you will take the data.

- **ITEM**: A temporary variable to hold each thing from 'items' in. It's like a tempooraty variable, call it what you'd like.

**OUTPUT**: A definition that you want to turn each item into.

- Format for the OUTPUT which is using the 'item' somehow. 


The syntax for this is:

`result = [my_output for one_item in all_items]`

For readability it's best to add an extra new line (Python basically ignores new lines) and write it like this:

`result = [ 
            my_output
            for one_item in all_items 
           ]`
           
Notice, you can call the 'item' variable whatever you want. Make it meaningful.

## Example Case studies:

### Lowercase words

To compare a number of tweets, you migtht want to simplify all words to be lower case, so that words `WOW`, `wow` amd `Wow` are considered the same word. 
To do that, for each word in your list of words, you would want to represent it as a lowercase version of itself. 
This way `['WOW','wow', 'Wow']` would become `['wow','wow', 'wow']`, hence making it easier to count, analyse etc.

By the way, to represent a word as its lowercase version, you can use `some_word.lower()`, e.g. change `Just` into `just`, `think` into `think` etc.

E.g. `"BAnanA".lower()` will return `"banana"`

In [None]:
# Note: 'some_word.lower()' turns that word into a lowercase.

words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lower_case_words = [ 
  word.lower() 
  for word in words 
]

print(lower_case_words)

### Represent words as only their lengths.

Example: For each word in words, represent it with that word's length e.g. change 'THIS' into 4, 'it' into 2 etc.

In [None]:
words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_words = [ 
  len(word) 
  for word in words 
]

print(lengths_of_words)

**What is Mapping?**

Notice: We are currently not filtering any data out and we keep everything.

If input is a list of somethings, then we are returning a list of ten something-else's.

**Mapping**: A process of taking something, and representing it in a different, often simpler format. 
Like taking streets or countries, and representing them on a piece of paper, as in... a Map 🗺 ;)

We are 'mapping' the data, leaving the amount of it unchanged. This detail will be important in a minute.

### Simplify a list of dictionaries, to a list of just one of its values.

Example: Take a collection of coffee orders and represent it as just the amount of sugars. In this case, to check if we still have enough sugar.

In [None]:
coffee_orders = [ {'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2} ]

number_of_sugars = [ 
   order['sugar'] # Represent/simplify each order, to just the number of sugars.
   for order in coffee_orders 
]

print(number_of_sugars)

### Represent a list of dictionaries as something else, e.g. a string.

Example: Take a collection of coffee orders and represent them as a sentence, like "Latte coffee, with Cow milk and 1 sugars".

In one of the next badges about working with strings we will learn how to do this in a much cleaner way.

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

orders_as_sentences = [ 
    order['type'] + " coffee, with "+ order['milk']+" milk and "+ str( order['sugar'] )+" sugars"
    for order in coffee_orders 
]

pp.pprint(orders_as_sentences)

### In a list represent each word as... itself. This does not make much sense, but why not.

Example: Take a list of words and represent each word as itself. It will make sense in a minute why you would do that.

Notice that in the place where output format is specified, we just leave the `word` with no changes. 
Not `len(word)` or `word.lower()` - just a simple, unchanged `word`.

In [None]:
words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_words = [ 
   word # Just keep word as it is. Represent word as.... word.
   for word in words 
]

print(lengths_of_words)

### And finally: 

## We can also FILTER the input list, and only keep some of its elements:

Optionally you can add a third line with a condition, that needs to be true for the item to be kept in the final result.


`
result = [  output /
            input /
            condition 
         ]
`

The syntax becomes: 

`
result = [  output /
            for one_item in all_items /
            if condition
         ]   
`

Note: This will FIRST execute the condition, and THEN produce output from all elements that returned True.

Rather than changing things that meet the condition, and leaving everything else unchanged.

**USEFUL!** The most meaningful order of reading a list comp is: input --> condition --> output

Note, this is how you read them. You still need to write them as seen in the above cell.

    for one_item in all_items
    if condition
    output of one_item

## Example Case Studies:

### Lowercase words, and only keep those longer than 3 characters.

Let's say that we discovered that most of short words are not very meaningful, so we would like to unify the set by removing short words and lowercasing everything else.

In [None]:
words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lowercased_long_words = [ 
    word.lower() # Turn a word into its lowercase version.
    for word in words # For each word in words list.
    if len(word) > 3 # But only keep those where word's length was over 3.
] 

print(lowercased_long_words)

### Represent all words starting with letter 't' as their lengths.

Let's say that we have a theory that words starting with 't' are all very long. To test it, we first will try to just represent them as their lengths.  

**NOTE ABOUT LEN( )** : Strings in Python are basically lists of characters. So `"this"` is identical to `['t','h','i','s']`. 

We will talk more about it in one of future badges. Than means that to get the first character of a word, we just request the item at index 0. 
E.g. `my_word[0]` or even `"Banana"[0]`. Just like we could get the length of a word with `len("banana")` as if it was `len(my_list_of_things)`.

In [None]:
words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_t_words = [ 
     len(word) # Turn a word into its length.
     for word in words # For each word in tokens list.
     if  word[0] == 't' # But only keep those where word's first letter is 't'.
] 

print(lengths_of_t_words)

# Notice that the RESULT of list comprehension is ALWAYS A LIST, even if it only has 1 item.

**Wait, what? How do you mean there's just one word starting with 't'**?

Is something not right here?! How about `"THIS"` and `"The"`? 
Well... technically they do not start with a `'t'`, but rather they start with a `'T'`.

**Python is case sensitive**, which means that `'T'` and `'t'` have nothing in common. 

You would need to lowercase first to confirm it starts with a `'t'` or `'T'`.

You could write an `or` condition to catch both versions:

`
if  word[0] == 't' or word[0] == 'T' ]
`

Or you could lowercase the word and compare it to `t`:


`
if  word.lower()[0] == 't' ]
`

### Oooooh. But what should be the order of reading a piece of code like this?


Try to read each block of code connected with `.` from the left-hand side. 

For example `word[0].lower()` means:  take word -> then just take the first character -> then lowercase it.

**Example:**

- Take word -> then just take the first character -> then lowercase it:

`
if  word[0].lower() == 't' ]  
`

- Take word -> lowercase the whole word -> and then get the first letter:

`
if  word.lower()[0] == 't' ]
`

Which one is better? With short words, it does not matter, but imagine words were long, and you a few million of them... then the first version will be much faster.

We will talk abotu all these scenarios and this type of daisy-chaining soon.

### Let's get back to our scenario... Represent all words starting with letter 't' as their lengths


In [None]:
words = ['LoL', 'THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_t_words = [ len(word)
                          for word in words
                          if  word[0].lower() == 't' ]

print(lengths_of_t_words)

Ok, this makes more sense!

### Simplify the data AND only keep some of them

For example, if we only wanted to get the 'sugar' content of vegan beverages, where teh value of 'milk' is different to 'Cow'

In [None]:
# How many sugars go into non-cow coffees?

coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

vegan_orders_as_sugars = [ order['sugar']
                      for order in coffee_orders 
                      if order['milk'] != 'Cow']

print(vegan_orders_as_sugars)

### A very common and useful scenario: Keeping only some elements, but not changing them.

We can even use the words as they came, for output, and not change them at all, just by using the filtering part.

For exmaple let's just keep the words starting with a `'t'` but not change them in any way.

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

t_words = [ 
    word # If you just want to filter words, but output should be the same as input.
    for word in words
    if word[0].lower() == 't' 
]

print(t_words)

### Mathemarical operators: 'max', 'min', 'len', 'sum'.

You can use methematical operators:

- **max( )** - Get the largest element in a group. For letters it will mean 'highest in the alphabet'.
- **len( )** - Size of the collection, can be used on lists, dicts, but also on strings/
- **sum( )** - Combine all elements. Just used for numbers. For now.

Now let's use these on your lists to achieve some very useful results:

In [None]:
print( max([3,4,5]) )
print( min([3,4,5]) )
print( len([3,4,5]) )
print( sum([3,4,5]) )

In [None]:
print( max(["banana", "kiwi", "plum"]) )
print( min(["banana", "kiwi", "plum"]) )
print( len(["banana", "kiwi", "plum"]) )
print(sum( ["banana", "kiwi", "plum"]) ) # Will error, yay! ;)

## Extra scenarios.

### Compare two groups:

- What percent of coffees are 'Latte'.

In [None]:
coffee_orders = [ {'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2} ]

all_lattes = [ 
    order
    for order in coffee_orders 
    if order['type'] == 'Latte'
]

number_of_lattes = len( all_lattes )
number_of_all_coffees = len( coffee_orders )

print("number of lattes", number_of_lattes)
print("percent of lattes", number_of_lattes / number_of_all_coffees,"%")

### Compare two groups:

- Who drinks more sugar - a) All Vegans OR b) Cow-based-Latte-drinkers?

In [None]:
coffee_orders = [ {'type': 'Latte', 'milk': 'Cow', 'sugar': 2}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2} ]

sugars_in_vegan_coffees = [ 
    order['sugar']
    for order in coffee_orders 
    if order['milk'] != 'Cow'
]

sugars_in_all_lattes = [ 
    order['sugar']
    for order in coffee_orders 
    if order['type'] == 'Latte' and order['milk'] == 'Cow'
]

if sum(sugars_in_vegan_coffees) > sum(sugars_in_all_lattes):
    print("there's more total sugar used in vegan coffees, than in all cow-milk lattes")
else:
    print("there's no more total sugar used in vegan coffees, than in all cow-milk lattes")


# ⛏ Minitask 1: Filter and modify/map some simple lists.

Remember every now and then to save your progress (File > Save, or a keyboard shortcut, OR click the 'Disc icon at the top left of your Noteable toolbar). 

Given the below lists, write list comprehentions that will change them into different formats. 

Write 3-4 list comprehentions for each, and try to keep them nonidentical to the examples within this Notebook.

Your datasets are: 

- Part of Romeo and Juliet by Shakespeare from http://www.gutenberg.org/files/1513/1513-0.txt (I already pasted it for you below, so don't worry about getting the text by yourself).
- Random sample of currency exchange values from pound to euro.

Some inspiration, but feel free to come up with your own scenarios:

- Who's name is mentioned more? (Romeo or Juliet)? Remember that you can filter a list and then ask for `len( )` of the result.
- What are the words longer than 5 characters starting with a letter 'M'?
- How many days has seen the pound being worth more than 1.1 euro?
- How much euro could you buy for 100 pounds on the days when pound was worth more than 1.1 euro?

In [None]:
shakespeare = ['JULIET', '.', '’', 'Tis', 'but', 'thy', 'name', 'that', 'is', 'my', 'enemy', ';',
              'Thou', 'art', 'thyself', ',', 'though', 'not', 'a', 'Montague', '.', 'What', '’', 's',
              'Montague', '?', 'It', 'is', 'nor', 'hand', 'nor', 'foot', ',', 'Nor', 'arm', ',', 'nor', 
              'face', ',', 'nor', 'any', 'other', 'part', 'Belonging', 'to', 'a', 'man', '.',
              'O', 'be', 'some', 'other', 'name', '.', 'What', '’', 's', 'in', 'a', 'name', '?',
              'That', 'which', 'we', 'call', 'a', 'rose', 'By', 'any', 'other', 'name', 'would', 'smell', 
              'as', 'sweet', ';', 'So', 'Romeo', 'would', ',', 'were', 'he', 'not', 'Romeo', 'call', '’',
              'd', ',', 'Retain', 'that', 'dear', 'perfection', 'which', 'he', 'owes', 'Without', 'that', 
              'title', '.', 'Romeo', ',', 'doff', 'thy', 'name', ',', 'And', 'for', 'thy', 'name', ',',
              'which', 'is', 'no', 'part', 'of', 'thee', ',', 'Take', 'all', 'myself', '.']

pound_to_euro = [1.079974, 1.084409, 1.087774, 1.096891, 1.094839, 1.091664, 1.090948,
                 1.090948, 1.089578, 1.086815, 1.092255, 1.093133, 1.0926, 1.095731, 
                 1.095731, 1.102182, 1.095389, 1.101342, 1.098181, 1.103804, 1.103819,
                 1.103819, 1.101216, 1.099458, 1.097423, 1.10108, 1.101352, 1.102922, 1.102922, 1.106241]

In [None]:
# Here write your solution! E.g. for these questions:
# - Who's name is mentioned more? (Romeo or Juliet)? Remember that you can filter a list and then ask for `len( )` of the result.
# - What are the words longer than 5 characters starting with a letter 'M'?
# - How many days has seen the pound being worth more than 1.1 euro?
# - How much euro could one buy for 100 pounds on the days when pound was being worth more than 1.1 euro?



 
# ⛏ Minitask 2: Extract and filter some information from JSON information about cash machines.

we've seen this dataset from another badge. Ask 2-3 business questions and answer them using a list comprehension.

Some inspiration, but feel free to coma up with your own scenarios:

- What are the city names where all these ATMs are based in?
- What is the typical number of currencies supported by an ATM?
- What are names of cities of ATMs that dispense 5 pound notes? (MinimumPossibleAmount is 5)

In [None]:
import pprint as pp

bank_of_scotland_atms = [{'Identification': 'BFF7BC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '5',
  'Branch': {'Identification': '80453100'},
  'Location': {'LocationCategory': ['BranchExternal'],
   'Site': {'Identification': '80453100'},
   'PostalAddress': {'AddressLine': ['136 BUCHANAN STREET; BALFRON'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': '136 BUCHANAN STREET',
    'TownName': 'GLASGOW',
    'CountrySubDivision': ['GLASGOW'],
    'Country': 'GB',
    'PostCode': 'G63 0TG',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '56.071629',
      'Longitude': '-4.336911'}}}}},
 {'Identification': 'BFA6HC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['AudioCashMachine', 'WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '10',
  'Branch': {'Identification': '80496000'},
  'Location': {'LocationCategory': ['BranchInternal'],
   'Site': {'Identification': '80496000'},
   'PostalAddress': {'AddressLine': ['BRISTOL ROOM COPLEY DATA CENTRE; WAKEFIELD ROAD'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': 'BRISTOL ROOM COPLEY DATA CENTRE',
    'TownName': 'HALIFAX',
    'CountrySubDivision': ['WEST YORKSHIRE'],
    'Country': 'GB',
    'PostCode': 'HX3 0TD',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '51.454900',
      'Longitude': '2.592000'}}}}},
 {'Identification': 'BF88GC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['AudioCashMachine', 'WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '5',
  'Branch': {'Identification': '80452300'},
  'Location': {'LocationCategory': ['BranchInternal'],
   'Site': {'Identification': '80452300'},
   'PostalAddress': {'AddressLine': ['111 HIGH STREET;'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': '111 HIGH STREET',
    'TownName': 'ANNAN',
    'CountrySubDivision': ['DUMFRIES AND GALLOWAY'],
    'Country': 'GB',
    'PostCode': 'DG12 6AB',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '54.987176',
      'Longitude': '-3.260075'}}}}}]

pp.pprint(bank_of_scotland_atms)

In [None]:
# Here write your solutions. E.g. for these questions:
# What are the city names where all these ATMs are based in?
# What is the typical number of currencies supported by an ATM?
# What are names of cities of ATMs that dispense 5 pound notes? (MinimumPossibleAmount is 5)



## ⭐️⭐️⭐️💥 What you learned in this session: Three stars and a wish.
**In your own words** write in your Learn diary:

- 3 things you Would like to remember from this badge.
- 1 thing you wish to understand better in the future or a question you'd like to ask.
