# Introduction
When doing data science, you need a way to organize your data so you can work with it efficiently. Python has many data structures available for holding your data, such as lists, sets, dictionaries, and tuples. In this tutorial, you will learn how to work with Python lists.

# Motivation
In the [Petal to the Metal competition](https://www.kaggle.com/c/tpu-getting-started), your goal is to classify the species of a flower based only on its image. (This is a common task in computer vision, and it is called image classification.) Towards this goal, say you organize the names of the flower species in the data.

One way to do this is by organizing the names in a Python string.

In [1]:
flowers = "pink primrose,hard-leaved pocket orchid,canterbury bells,sweet pea,english marigold,tiger lily,moon orchid,bird of paradise,monkshood,globe thistle"

print(type(flowers))
print(flowers)

<class 'str'>
pink primrose,hard-leaved pocket orchid,canterbury bells,sweet pea,english marigold,tiger lily,moon orchid,bird of paradise,monkshood,globe thistle


Even better is to represent the same data in a Python list. To create a list, you need to use square brackets (``[``, ``]``) and separate each item with a comma. Every item in the list is a Python string, so each is enclosed in quotation marks.

In [2]:
flowers_list = [
    "pink primrose",
    "hard-leaved pocket orchid",
    "canterbury bells",
    "sweet pea",
    "english marigold",
    "tiger lily",
    "moon orchid",
    "bird of paradise",
    "monkshood",
    "globe thistle"
]

print(type(flowers_list))
print(flowers_list)

<class 'list'>
['pink primrose', 'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea', 'english marigold', 'tiger lily', 'moon orchid', 'bird of paradise', 'monkshood', 'globe thistle']


At first glance, it doesn't look too different, whether you represent the information in a Python string or list. But as you will see, there are a lot of tasks that you can more easily do with a list. For instance, a list will make it easier to:

- get an item at a specified position (first, second, third, etc),
- check the number of items, and
- add and remove items.|

# Lists

## Length
We can count the number of entries in any list with ``len()``, which is short for "length". You need only supply the name of the list in the parentheses.

In [3]:
# The list has ten entries
print(len(flowers_list))

10


## Indexing
We can refer to any item in the list according to its position in the list (first, second, third, etc). This is called **indexing**.

Note that Python uses zero-based indexing, which means that:

- to pull the first entry in the list, you use 0,
- to pull the second entry in the list, you use 1, and
- to pull the final entry in the list, you use one less than the length of the list.

In [4]:
print("First entry:", flowers_list[0])
print("Second entry:", flowers_list[1])

# The list has length ten, so we refer to final entry with 9
print("Last entry:", flowers_list[9])

First entry: pink primrose
Second entry: hard-leaved pocket orchid
Last entry: globe thistle


**Side Note**: You may have noticed that in the code cell above, we use a single ``print()`` to print multiple items (both a Python string (like ``"First entry:"``) and a value from the list (like ``flowers_list[0]``). To print multiple things in Python with a single command, we need only separate them with a comma.

## Slicing
You can also pull a segment of a list (for instance, the first three entries or the last two entries). This is called **slicing**. For instance:

- to pull the first ``x`` entries, you use ``[:x]``, and
- to pull the last ``y`` entries, you use ``[-y:]``.

In [5]:
print("First three entries:", flowers_list[:3])
print("Final two entries:", flowers_list[-2:])

First three entries: ['pink primrose', 'hard-leaved pocket orchid', 'canterbury bells']
Final two entries: ['monkshood', 'globe thistle']


As you can see above, when we slice a list, it returns a new, shortened list.

## Removing items
Remove an item from a list with ``.remove()``, and put the item you would like to remove in parentheses.

In [6]:
flowers_list.remove("globe thistle")
print(flowers_list)

['pink primrose', 'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea', 'english marigold', 'tiger lily', 'moon orchid', 'bird of paradise', 'monkshood']


## Adding items
Add an item to a list with ``.append()``, and put the item you would like to add in parentheses.

In [7]:
flowers_list.append("snapdragon")
print(flowers_list)

['pink primrose', 'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea', 'english marigold', 'tiger lily', 'moon orchid', 'bird of paradise', 'monkshood', 'snapdragon']


## Lists are not just for strings
So far, we have only worked with lists where each item in the list is a string. But lists can have items with any data type, including booleans, integers, and floats.

As an example, consider hardcover book sales in the first week of April 2000 in a retail store.

In [8]:
hardcover_sales = [139, 128, 172, 139, 191, 168, 170]

Here, ``hardcover_sales`` is a list of integers. Similar to when working with strings, you can still do things like get the length, pull individual entries, and extend the list.

In [9]:
print("Length of the list:", len(hardcover_sales))
print("Entry at index 2:", hardcover_sales[2])

Length of the list: 7
Entry at index 2: 172


You can also get the minimum with ``min()`` and the maximum with ``max()``.

In [10]:
print("Minimum:", min(hardcover_sales))
print("Maximum:", max(hardcover_sales))

Minimum: 128
Maximum: 191


To add every item in the list, use ``sum()``.

In [11]:
print("Total books sold in one week:", sum(hardcover_sales))

Total books sold in one week: 1107


We can also do similar calculations with slices of the list. In the next code cell, we take the sum from the first five days (sum(``hardcover_sales[:5]``)), and then divide by five to get the average number of books sold in the first five days.

In [12]:
print("Average books sold in first five days:", sum(hardcover_sales[:5])/5)

Average books sold in first five days: 153.8


# Practice

## Question 1
You own a restaurant with five food dishes, organized in the Python list ``menu`` below. One day, you decide to:

- remove bean soup (``'bean soup'``) from the menu, and
- add roasted beet salad (``'roasted beet salad'``) to the menu.

Implement this change to the list below. While completing this task,

- do not change the line that creates the ``menu`` list.
- your answer should use ``.remove()`` and ``.append()``.

In [15]:
# Do not change: Initial menu for your restaurant
menu = [
    'stewed meat with onions',
    'bean soup',
    'risotto with trout and shrimp',
    'fish soup with cream and onion',
    'gyro'
]

# TODO: remove 'bean soup', and add 'roasted beet salad' to the end of the menu

from tests.intro_to_lists import expected_menu
assert menu == expected_menu

## Question 2
The list ``num_customers`` contains the number of customers who came into your restaurant every day over the last month (which lasted thirty days). Fill in values for each of the following:

- ``avg_first_seven`` - average number of customers who visited in the first seven days
- ``avg_last_seven`` - average number of customers who visited in the last seven days
- ``max_month`` - number of customers on the day that got the most customers in the last month
- ``min_month`` - number of customers on the day that got the least customers in the last month

Answer this question by writing code. For instance, if you have to find the minimum value in a list, use ``min()`` instead of scanning for the smallest value and directly filling in a number.

In [14]:
# Do not change: Number of customers each day for the last month
num_customers = [
    137, 147, 135, 128, 170, 174, 165, 146, 126, 159,
    141, 148, 132, 147, 168, 153, 170, 161, 148, 152,
    141, 151, 131, 149, 164, 163, 143, 143, 166, 171
]

# TODO: Fill in values for the variables below
avg_first_seven = ...
avg_last_seven = ...
max_month = ...
min_month = ...

from tests.intro_to_lists import question_2
assert avg_first_seven == question_2["avg_first_seven"]
assert avg_last_seven == question_2["avg_last_seven"]
assert max_month == question_2["max_month"]
assert min_month == question_2["min_month"]

## Question 3
In the tutorial, we gave an example of a Python string with information that was better as a list.

In [16]:
flowers = "pink primrose,hard-leaved pocket orchid,canterbury bells,sweet pea,english marigold,tiger lily,moon orchid,bird of paradise,monkshood,globe thistle"

You can actually use Python to quickly turn this string into a list with ``.split()``. In the parentheses, we need to provide the character should be used to mark the end of one list item and the beginning of another, and enclose it in quotation marks. In this case, that character is a comma.

In [18]:
flowers.split(",")

['pink primrose',
 'hard-leaved pocket orchid',
 'canterbury bells',
 'sweet pea',
 'english marigold',
 'tiger lily',
 'moon orchid',
 'bird of paradise',
 'monkshood',
 'globe thistle']

Now it is your turn to try this out! Create two Python lists:

- ``letters`` should be a Python list where each entry is an uppercase letter of the English alphabet. For instance, the first two entries should be ``"A"`` and ``"B"``, and the final two entries should be ``"Y"`` and ``"Z"``. Use the string ``alphabet`` to create this list.
- ``address`` should be a Python list where each row in ``address`` is a different item in the list. Currently, each row in ``address`` is separated by a comma.

In [19]:
# DO not change: Define two Python strings
alphabet = "A.B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z"
address = "Mr. H. Potter,The cupboard under the Stairs,4 Privet Drive,Little Whinging,Surrey"

# TODO: Convert strings into Python lists
letters = ...
formatted_address = ...

from tests.intro_to_lists import expected_alphabet, expected_address
assert letters == expected_alphabet
assert formatted_address == expected_address

## Question 4
In the Python course, you'll learn all about **list comprehensions**, which allow you to create a list based on the values in another list. In this question, you'll get a brief preview of how they work.

Say we're working with the list below.

In [23]:
test_ratings = [1, 2, 3, 4, 5]

Then we can use this list (``test_ratings``) to create a new list (``test_liked``) where each item has been turned into a boolean, depending on whether or not the item is greater than or equal to four.

In [24]:
test_liked = [i>=4 for i in test_ratings]
print(test_liked)

[False, False, False, True, True]


In this question, you'll use this list comprehension to define a function ``percentage_liked()`` that takes one argument as input:

- ``ratings``: list of ratings that people gave to a movie, where each rating is a number between 1-5, inclusive

We say someone liked the movie, if they gave a rating of either 4 or 5. Your function should return the percentage of people who liked the movie.

For instance, if we supply a value of ``[1, 2, 3, 4, 5, 4, 5, 1]``, then 50% (4/8) of the people liked the movie, and the function should return ``0.5``.

Part of the function has already been completed for you. You need only use ``list_liked`` to calculate ``percentage_liked``.

In [26]:
def percentage_liked(ratings):
    list_liked = [i >= 4 for i in ratings]
    percentage_liked = ...
    return percentage_liked

assert percentage_liked([1, 2, 3, 4, 5, 4, 5, 1]) == 0.5

## Question 5
Say you're doing analytics for a website. You need to write a function that returns the percentage growth in the total number of users relative to a specified number of years ago.

Your function ``percentage_growth()`` should take two arguments as input:

- ``num_users`` = Python list with the total number of users each year. So ``num_users[0]`` is the total number of users in the first year, ``num_users[1]`` is the total number of users in the second year, and so on. The final entry in the list gives the total number of users in the most recently completed year.
- ``yrs_ago`` = number of years to go back in time when calculating the growth percentage

For instance, say ``num_users = [920344, 1043553, 1204334, 1458996, 1503323, 1593432, 1623463, 1843064, 1930992, 2001078]``.

- if ``yrs_ago = 1``, we want the function to return a value of about ``0.036``. This corresponds to a percentage growth of approximately 3.6%, calculated as (2001078 - 1930992)/1930992.
- if ``years_ago = 7``, we would want to return approximately ``0.66``. This corresponds to a percentage growth of approximately 66%, calculated as (2001078 - 1204334)/1204334.

In [None]:
def percentage_growth(num_users, yrs_ago):
    growth = ...
    return growth

# Do not change: Variable for calculating some test examples
num_users_test = [920344, 1043553, 1204334, 1458996, 1503323, 1593432, 1623463, 1843064, 1930992, 2001078]

# Do not change: Should return approx .036
print(percentage_growth(num_users_test, 1))

# Do not change: Should return approx 0.66
print(percentage_growth(num_users_test, 7))

# Congratulations!
Congratulations for finishing the Intro to Programming course! You should be proud of your very first steps with learning programming. As next steps, we recommend taking the [Python Course](..\02_python\01_hello_python.ipynb)