# Dataquest.io Path: DATA SCIENTIST IN PYTHON

# Step 1 of 8: PYTHON INTRODUCTION

## Course 1/2: Python for Data Science: Fundamentals

## Mission 1/9: Programming in Python

### 1. Learning Data Science

### Welcome to Python for Data Science: Fundamentals!

In this course, we’ll learn the fundamentals of data science and Python programming. We'll then use this knowledge to analyze data about thousands of Android and Google Play mobile apps. We’ve designed this course for beginners — no previous math, stats, or programming experience required — so you can start learning right away.

At the end of the course, you’ll have a portfolio-worthy project to share with potential employers, and a certificate of completion to add to your LinkedIn profile.

In this course, we’ll focus primarily on Python programming for data science. In later courses, we'll build on this programming knowledge to learn data visualization, statistics, linear algebra, machine learning, databases, and more. All these courses fit together in a well-organized curriculum, so you don't have to get to the end of a course and worry about what to do next.

![image.png](attachment:image.png)

### 2. Programming in Python

There's been a lot of discussion and enthusiasm around data science in the last couple of years. Using data science, people have been able to build some amazing technologies:
![image.png](attachment:image.png)

To build data science technologies, we need to give the computer the proper instructions to learn from data. When we give instructions to a computer, we say that we're programming it.

To program a computer, we need to write the instructions in a special language, which we call a programming language. In this course, we'll learn Python, the most popular programming language for data science.

In the exercise below, we're going to instruct the computer to add two numbers together.

## Instructions

1. Instruct the computer to add two numbers together: 23 + 7. To do that, you'll need to:

In [2]:
23 + 7

30

### 3. The print() Command

On the previous exercise, we instructed the computer to perform a single calculation: 23 + 7. The computer followed the instruction and returned 30 as a result.

However, we can ask the computer to perform more than just one calculation.

In the diagram below, for example, we instruct the computer to perform three calculations:

![image.png](attachment:image.png)

The computer returned only 50 as a result, and it may look like it only performed the last addition, 12 + 38.

In reality, the computer performed all three calculations, but it only displayed the result for the last one. To display all three results, we need to use the print() command, like this:
![image.png](attachment:image.png)
Now let's practice using the print() command.

## Instructions

Using the print() command, display the result for:

- 40 + 4
- 200 - 25
- 14 + 3

In [29]:
print(40 + 4)
print(200 - 25)
print(14 + 3)

44
175
17


### 4. Python Syntax

Previously, we sent the computer three instructions and wrote each on a separate line. If we were to put them all on the same line, we'd get an error:

In [8]:
print(40 + 4) print(200 - 25) print(14 + 3)

SyntaxError: invalid syntax (<ipython-input-8-6ca4974037c6>, line 1)

print(23 + 7)  print(10 - 6)  print(12 + 38) resulted in red text describing a syntax error. This is because all programming languages — Python included — have syntax rules. Each line of instructions has to comply with these rules.

This is similar to the syntax rules we have in human languages. If you want to use English to tell a friend that you like data science, you need to respect syntax rules to correctly convey your message. Your friend will understand "I like data science," but not "science I data like." Similarly, the computer didn't understand print(23 + 7) print(10 - 6) print(12 + 38) because the syntax was incorrect.

Running into errors is common when we're programming, and we'll constantly learn about errors and how to fix them. Now let's practice giving the computer more instructions.

## Instructions

Run the instructions below in the code editor. Remember that each instruction must be on a separate line.

In [9]:
print(30 + 10 + 40)
print(4)
print(-3)

80
4
-3


### 5. Computer Programs

All these instructions are collectively known as __code__. And each line of instruction is known as a __line of code__.
![image.png](attachment:image.png)

When we write code, we program the computer to do something. For this reason, we also call the code we write a __computer program__, or a __program__. The program we wrote in the previous exercise had three lines of code, but a program can be as small as one line.

The code we write serves as __input__ to the computer. The result of executing the code is called __output__.
![image.png](attachment:image.png)

As we've learned, we can display the output using the print() command. 
Note, however, that the code editor displays the output of the last line of code regardless of whether we use print() or not. In the example below, we see the output of 2 + 3 is only displayed when 2 + 3 is the last line of code.
![image.png](attachment:image.png)

## Instructions
By using the print() command, write a program that has three lines of code and:

* Displays the result of 34 + 16.
* Displays the number 34.
* Displays the number -34.

In [13]:
print(34 + 16)
print(34)
print(-34)

50
34
-34


### 6. Code Comments

The computer executes code from the first line downwards and ignores blank lines.
![image.png](attachment:image.png)

Besides blank lines, the computer also ignores any sequence of characters that comes to the right of the # symbol. In the example below, we use # before print(5 + 1), and we see the output of print(5 + 1) is not displayed anymore — this is because print(5 + 1) is not executed when it's preceded by #.
![image.png](attachment:image.png)

The sequence of characters that follows the # symbol is called a __code comment__. We can also use code comments to add information about our code:
![image.png](attachment:image.png)

Another way we could use code comments is adding a general description at the beginning of our program.
![image.png](attachment:image.png)

## Instructions
#print(34 + 16)
#print(34)
#print(-34)

Uncomment these three lines of code by removing the # symbols, and then run the code

In [15]:
print(34 + 16)
print(34)
print(-34)

50
34
-34


### 7. Arithmetic Operations
Previously, we wrote programs that only performed additions and subtractions. We can also perform multiplication and division in Python. To perform multiplication, we need to use the * character. For instance, this is how we multiply 3 by 2:

![image.png](attachment:image.png)

To perform division, we use the / character. This is how we divide 3 by 2:

![image.png](attachment:image.png)

We can also perform exponentiation (raising a number to a power) by using **. For example, this is how we can raise 4 to the power of 2 (in mathematical notation, we'd write this as 4<sup>2</sup>).

![image.png](attachment:image.png)

The arithmetical operations we do in Python follow the usual order of operations we know from mathematics. Parentheses are calculated first, then exponentiation, then division and multiplication, and finally, addition and subtraction.

![image.png](attachment:image.png)

Looking at the code example above, we can deduce from the first operation (4 + 2 * 10) and its corresponding result (24) that multiplication precedes addition. However, for the second operation ((4 + 2) * 10), the addition is calculated first because this time it's surrounded by parentheses. Consequently, the result is 60.

So far we've used space characters between numbers and operators (+, -, *, /, ** are operators). For instance, we've used 4 + 5 instead of 4+5. But Python's syntax rules do not enforce this, so both 4 + 5 and 4+5 will run correctly. However, we encourage you to use spaces in your own code as this helps with readability.

## Instructions
Write a program with three lines of code that performs the following arithmetical operations and displays the results (you'll need to use the print() command to display the results):
- 16 × 10
- 48 ÷ 5
- 5<sup>3</sup>
 (make sure you don't type 5^3 because ^ is not the symbol for exponentiation)

In [16]:
print(16 * 10)
print(48 / 5)
print(5 ** 3)

160
9.6
125


### 8. Next Steps
At Dataquest, each course is composed of missions, and this screen marks the end of the first mission. Congratulations!

So far, we've focused on learning how to successfully run simple computer programs in Python. In the next mission, we'll use this fundamental knowledge to learn more about programming and data science:

*We'll learn how to save our data using variables.

*We'll learn how to work better with numerical data.

*We'll learn how to work with text data.

## Mission 2/9: Variables and Data Types

### 1. Saving Values
In the first mission, we learned the basics of Python programming and performed a few arithmetical operations using Python. In this mission, we'll learn how to save values, and how to work with numerical and text data.

Let's say we want to save the result of an arithmetical operation for later work. For instance, (8 + 2) * 2 equals 20, and we want to save 20. This is the code we need to run to save 20:
![image.png](attachment:image.png)

If we print the name result, the output is 20:
![image.png](attachment:image.png)

We can also save directly (8 + 2) * 2 instead of saving 20.
![image.png](attachment:image.png)

Notice, however, that print(result) outputs 20, not (8 + 2) * 2. This is because the computer first calculates (8 + 2) * 2 and then saves the result 20 to result.

## Instructions
1. Save the result of (42 - 11) * 22 to result.
2. Print result.

In [18]:
result = (42 - 11) * 22
print(result)

682


### 2. Variables
Previously, we saved 20 to result.
![image.png](attachment:image.png)

When we run the code result = 20, the value 20 is saved in the computer memory. The computer memory has many storage locations, and 20 is saved to one particular location.
![image.png](attachment:image.png)

The storage location to which we saved 20 has a unique identifier, and we can use it to access 20. The identifier is named result, and we named it that way when we ran the code result = 20. We can use the identifier result to access 20 in other lines of code:
![image.png](attachment:image.png)

The storage location for 20 is more commonly known as a variable. When we ran the code result = 20, we stored 20 in a variable (storage location) named result — so result is a variable name.

Note that we need to write the variable name to the left of the = operator and the value we want to store to the right. So if we want to store the value 20 to a variable named result, we must write result = 20, not 20 = result.

We chose the name result arbitrarily, but we could have chosen something different:
![image.png](attachment:image.png)

## Instructions
1. Store the value 15 in a variable named a_value.
2. Store the result of (25 - 7) * 17 to a variable named a_result.
3. Using the print() command, display the following:

 - The value stored in the a_value variable.
 - The result of adding 12 to the variable a_result.
 - The result of adding a_value to a_result.

In [19]:
a_value = 15
a_result = (25 - 7) * 17
print(a_value)
print(a_result + 12)
print(a_value + a_result)

15
318
321


### 3. Variable Names
In the last screen, we learned that we can choose different names for variables. However, the names we can use must comply with a number of syntax rules. For instance, naming a variable a result will output a syntax error because we're not allowed to use space characters in variable names.
![image.png](attachment:image.png)

These are the two syntax rules we need to be aware of when we're naming variables:

1. We must use only letters, numbers, or underscores (we can't use apostrophes, hyphens, whitespace characters, etc.).
2. Variable names cannot start with a number.

![image.png](attachment:image.png)

Note that variable names are case sensitive, which means that a variable named result is different than a variable named Result:
![image.png](attachment:image.png)

## Instructions
In the code editor on the right, we attempted to store 34000 in a variable named old-income, and 40000 in a variable named new income. But both of these variable names cause syntax errors, so we commented-out the code.

1. Change the variable name old-income to old_income to prevent a syntax error.
2. Change the variable name new income to new_income to prevent a syntax error.
3. Remove the # from each line so that the code will run, then run the code.

In [21]:
#INITIAL CODE
old-income = 34000
new-income = 40000

SyntaxError: cannot assign to operator (<ipython-input-21-9e8081205a27>, line 2)

In [22]:
# CHANGED CODE
old_income = 34000
new_income = 40000

### 4. Updating Variables
The value stored in a variable can be updated. Below, we first store 30 in the variable x, and then we update x to store 70 instead.
![image.png](attachment:image.png)

We can also update a variable by doing arithmetical operations:
![image.png](attachment:image.png)

Notice in the code above that:

- The variable x initially stores a value of 30.
- x + 70 evaluates to 100 because x stores a value of 30 — so x + 70 becomes 30 + 70.
- When we run x = x + 70, x is updated to store the result of x + 70, which is 100. Running x = x + 70 is the same as running x = 30 + 70 because x stores 30.
- print(x) outputs 100 after we run x = x + 70.

## Instructions
1. Update the variable income by adding 6000 to its current value. The variable income is already shown in the code editor.
2. Print income.

In [23]:
income = 34000

In [24]:
income = 34000 + 6000
print(income)

40000


### 5. Syntax Shortcuts


In the previous screen, we used the code x = x + 70 to update x from 30 to 100.
![image.png](attachment:image.png)

There are several syntax shortcuts we can use to update a variable when we're doing arithmetical operations. In the code above, for instance, we can write x += 70 instead of x = x + 70:
![image.png](attachment:image.png)

Below is a table with some syntax shortcuts we can use:
![image.png](attachment:image.png)

Notice that these operators (+=, -=, *=, /=, **=) can only be used to update a variable. This means the variable being updated must already store a value. In other words, the variable must already be defined. When we try to update a variable that we haven't defined, we get an error called __NameError__.
![image.png](attachment:image.png)

This kind of error is different from the syntax error we learned about in the first mission. y += 10 is correct Python syntax, but the computer returns an error because it can't update a variable that hasn't been yet defined. Whenever the syntax is correct but the computer still returns an error for one reason or another, it's called a __runtime error__.

Notice also that we updated a variable using x = x + 1. In mathematics, x = x + 1 would be a false statement because x can never be equal to x + 1. This tells us that the = operator doesn't have the same meaning as it does in mathematics.

In Python, the = operator tells us that the value on the right is assigned to the variable on the left. It doesn't tell us anything about equality. We call = an __assignment operator__, and we read code like x = 5 as "five is assigned to x" or "x is assigned five," but not "x equals five."

## Instructions
1. Assign a value of 20 to a variable named variable_1.
2. Assign a value of 20 to a variable named variable_2.
3. Update the value of variable_2 by adding 10 to its current value. You can take advantage of the += operator.
4. Update the value of variable_1 by multiplying its current value by 4. You can take advantage of the *= operator.
5. Display variable_1 and variable_2 using print().

In [25]:
variable_1 = 20
variable_2 = 20
variable_2 += 10
variable_1 *= 4
print(variable_1)
print(variable_2)

80
30


### 6. Integers and Floats
So far, we only worked with integers like 20, -3, 30, etc. We can also make computations with decimal numbers:
![image.png](attachment:image.png)

In mathematics, integers are not the same as decimal numbers, and Python acknowledges this difference. We can use the type() command to see the type of a value, and confirm that Python distinguishes between integers and decimal numbers:
![image.png](attachment:image.png)

Notice that the integer 2 has the _int_ type, and the decimal number 8.5 has the _float_ type.

__All integers have the int type, and all decimal numbers have the float type.__

In computer programming, values are classified into different __types__, or __data types__. The type of value offers the computer the required information about the way that value should be handled. Depending on the type, the computer will know how to store a value in memory, or what operations can and can't be performed on a value.

_int_ and _float_ values have different types, but we can mix the values together in arithmetical operations. So we're not limited, for instance, to add an int value only to another int value — we can add an int value to a float value:
![image.png](attachment:image.png)


## Instructions
1. Assign the integer 10 to a variable named variable_1.
2. Assign the float 2.5 to a variable named variable_2.
3. Update the value of variable_1 by adding the float 6.5 to its current value. You can use the += operator.
4. Update the value of variable_2 by multiplying its current value by the integer 2. You can use the *= operator.
5. Display variable_1 and variable_2 using print().

In [27]:
variable_1 = 10
variable_2 = 2.5
variable_1 += 6.5
variable_2 *= 2
print(variable_1)
print(variable_2)

16.5
5.0


### 7. Conversion between Types
It's possible to convert a float to an integer and vice versa. To convert an integer to a float, we can use the float() command. To convert a float to an integer, we can use the int() command:
![image.png](attachment:image.png)

Notice the int() command rounded 4.3 down to 4. int() will always round a float down, even if the number after the decimal point is greater than five.
![image.png](attachment:image.png)

If we want to round off a number, we can instead use the round() command, which has more flexibility and can also round up:
![image.png](attachment:image.png)

Note that it's possible to combine commands. For instance, we can encompass a round() command within a print() command. This is useful in some cases — if we wrote three round() commands one after another, only the output of the last one would be displayed:
![image.png](attachment:image.png)

Note that running the round() command doesn't change the value stored by a variable unless we assign the rounded value back to the variable:
![image.png](attachment:image.png)

## Instructions
1. Assign the value 13.9 to a variable named variable_a.
2. Assign the value 2.8 to a variable named variable_b.
3. Round variable_a using the round() command, and assign back the rounded value to variable_a.
4. Convert variable_b from a float to an integer using the int() command, and assign back the converted value to variable_b.
5. Display variable_a and variable_b using the print() command.

In [28]:
variable_a = 13.9
variable_b = 2.8
variable_a = round(variable_a)
variable_b = int(variable_b)
print(variable_a, variable_b)

14 2


### 8. Strings
So far, we've only dealt with int and float values. But in data science, numbers are not the only type of data we work with. For instance, consider the table below, which provides some information about five mobile applications from the iOS store:
![image.png](attachment:image.png)

Data source: Mobile App Store data set (Ramanathan Perumal) https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

We can see the data in columns track_name and currency are represented using text, not numbers. In Python, we can create text by enclosing a sequence of characters within quotation marks (" "):
![image.png](attachment:image.png)

Python syntax allows both double quotation marks (" ") and single quotation marks (' '). So if we want to create the word "Facebook," we can use either "Facebook", or 'Facebook'.
![image.png](attachment:image.png)

In programming, we call sequences of characters like "Facebook", "USD", or "dasdaslkj" strings. In Python, a string is of the str type:
![image.png](attachment:image.png)

When we create strings, we're not limited to using letters — we can also use numbers, spaces, or other characters:
![image.png](attachment:image.png)

## Instructions
1. Assign the string Pandora - Music & Radio to a variable named app_name.
2. Assign the string 4.0 to a variable named average_rating. Make sure you don't mistake a string for a float.
3. Assign the string 1724546 to a variable named total_ratings. Make sure you don't mistake a string for an integer.
4. Assign the string free to a variable named price.
5. Display the app_name variable using print().

In [29]:
app_name = "Pandora - Music & Radio"
average_rating = "4.0"
total_ratings = "1724546"
price = "free"
print(app_name)

Pandora - Music & Radio


### 9. Escaping Special Characters
Sometimes we'll need to create strings with quotation marks inside, like in this example: Facebook's old motto was 'move fast and break things'.

In situations like these, we need to alternate double quotation marks (" ") with single quotation marks (' '):

![image.png](attachment:image.png)

Above, we started the string with a double quotation mark. This lets Python know the string ends where the second double quotation mark is. As a consequence, Python considers the single quotation marks in 'move fast and break things' as being part of the string.
![image.png](attachment:image.png)

However, we may want to surround the motto move fast and break things with double quotation marks: Facebook's old motto was "move fast and break things.". One solution is using single quotation marks to specify the start and the end of the string. However, the single quotation mark in Facebook's will cause Python to think that the string ends there.
![image.png](attachment:image.png)

Creating the string above will result in a syntax error because Python is confused about what comes after the string.
![image.png](attachment:image.png)

Fortunately, we can cancel the special function of the second single quotation mark (its special function is to end the string) by typing a backslash character (\) in front of it:
![image.png](attachment:image.png)

The \ character has a special function within a string: it escapes (cancels) the special function of characters. Above, we used \ to escape the second single quotation mark, which had the special function of ending the string.

## Instructions
1. Assign the string Facebook's new motto is "move fast with stable infra." to a variable named motto.
  - Notice there's a . character at the end of Facebook's new motto is "move fast with stable infra." — you'll need to include the . character in your answer.
2. Display the variable motto using print() — displaying motto is required for answer checking.

In [30]:
motto = 'Facebook\'s new motto is "move fast with stable infra."'
print(motto)

Facebook's new motto is "move fast with stable infra."


### 10. String Operations
When we have two or more distinct strings, it's possible to link them together using the + operator:
![image.png](attachment:image.png)

The process of linking two or more strings together is called __concatenation__.

It's also possible to concatenate a string with one or more copies of itself using the * operator, followed by a number which specifies the number of times the string has to be multiplied:
![image.png](attachment:image.png)


We can't perform arithmetical operations between strings and integers, or strings and floats (decimal numbers).
![image.png](attachment:image.png)

The only exception is when we concatenate a string with copies of itself and use code like 'a' * 2. But that's not an arithmetical operation anyway, so this exception is rather syntactical.

If the strings contain characters that form a valid number (like '4', '3.3', '12', etc.), it's possible to convert them to integers or floats first, and then do the arithmetical operations. We can use the int() or float() command to convert a string of type str to a number of type int or float.

![image.png](attachment:image.png)

Note that we can also convert an int or a float to a str using the str() command. Below, we convert the integer 4 to the string '4' (notice the quotation marks in '4').
![image.png](attachment:image.png)

On a side note, strings are displayed without quotation marks when we use the print() command.
![image.png](attachment:image.png)

So far, we've only been working with one-line strings, but we can also write strings over many lines using the triple quotation mark symbol (''' or """).
![image.png](attachment:image.png)

Using triple quotation marks also allows us to use both single and double quotation marks without needing to escape them.
![image.png](attachment:image.png)

## Instructions
1. Assign the string Facebook's rating is to a variable named facebook.
2. Assign the float 3.5 to a variable named fb_rating.
3. Convert fb_rating from a float to a string using the str() command, and assign the converted value to a new variable named fb_rating_str.
4. Concatenate the strings stored in facebook and fb_rating_str to form the string Facebook's rating is 3.5.
  - Assign the concatenated string to a variable named fb.
  - You'll need to add a space character between Facebook's rating is and 3.5 to avoid ending up with the string Facebook's rating is3.5.
5. Display the fb variable using print() — this is required for answer checking.

In [31]:
facebook = "Facebook's rating is"
fb_rating = 3.5
fb_rating_str = str(fb_rating)
fb = facebook + " " + fb_rating_str
print(fb)

Facebook's rating is 3.5


### 11. Next Steps
In this mission, we learned:

How to save values using variables
What integers and floats are
How to work with text data using strings
How to convert between different data types
In the next mission, we'll learn new programming concepts while working with a table that has 16 columns and more than 7,000 rows.

![image.png](attachment:image.png)

## Mission 3/9: Lists and For Loops

### 1. Lists
Toward the end of the previous mission, we worked with this table:
![image.png](attachment:image.png)

Each value in the table is a data point. For instance, the first row has five data points:

- Facebook
- 0.0
- USD
- 2974676
- 3.5

A collection of data points make up a data set. We can understand our entire table above as a collection of data points, so we call the entire table a data set. We can see that our data set has five rows and five columns.

When we work with data sets, we need to store them in the computer memory to be able to retrieve and manipulate the data points. Using what we've learned so far, we might think we could store each data point in a variable — for instance, this is how we might store the first row's data points:
![image.png](attachment:image.png)


Above, we stored:

- The text "Facebook" as a string 
- The price 0.0 as a float
- The text "USD" as a string
- The rating count 2,974,676 as an integer
- The user rating 3.5 as a float

Creating a variable for each data point in our data set would be a cumbersome process. Fortunately, we can store data more efficiently using lists. This is how we can create a list of data points for the first row:
![image.png](attachment:image.png)

To create the list above, we:

- Typed out a sequence of data points and separated each with a comma: 'Facebook', 0.0, 'USD', 2974676, 3.5
- Surrounded the sequence with brackets: ['Facebook', 0.0, 'USD', 2974676, 3.5]

After we created the list, we stored it in the computer's memory by assigning it to a variable named row_1.

To create a list of data points, we only need to:

- Separate the data points with a comma.
- Surround the sequence of data points with brackets.
Now let's get a little practice with creating lists.

## Instructions
1. Store the second row ('Instagram', 0.0, 'USD', 2161558, 4.5) as a list in a variable named row_2.
2. Store the third row ('Clash of Clans', 0.0, 'USD', 2130805, 4.5) as a list in a variable named row_3.

In [1]:
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

### 2. Indexing
A list can contain both mixed and identical data types (so far we've learned four data types: integers, floats, strings, and lists). A list like [4, 5, 6] has identical data types (only integers), while the list ['Facebook', 0.0, 'USD', 2974676, 3.5] has mixed data types:

- Two strings ('Facebook', 'USD')
- Two floats (0.0, 3.5)
- One integer (2974676)

The ['Facebook', 0.0, 'USD', 2974676, 3.5] list has five data points. To find the length of a list, we can use the len() command:
![image.png](attachment:image.png)

For small lists, we can just count the data points on our screens to find the length, but the len() command will prove very useful later on, when we work with lists containing thousands of elements (we'll see an actual example later in this mission).

Each element (data point) in a list has a specific number associated with it, called an index number. The indexing always starts at 0, so the first element will have the index number 0, the second element the index number 1, and so on.
![image.png](attachment:image.png)

To quickly find the index of a list element, identify its position number in the list, and then subtract 1. For example, the string 'USD' is the third element of the list (position number 3), so its index number must be 2 since 3 - 1 = 2.

The index numbers help us retrieve individual elements from a list. Looking back at the list row_1 from the code example above, we can retrieve the first element (the string 'Facebook') with the index number 0 by running the code row_1[0].
![image.png](attachment:image.png)

As a side note, you may have noticed above that we used row_1[0] rather than print(row_1[0]). Recall from the first mission that the code editor displays the last line of code regardless of whether we use print() or not.

The syntax for retrieving individual list elements follows the model list_name[index_number]. For instance, the name of our list above is row_1 and the index number of the first element is 0 — following the list_name[index_number] model, we get row_1[0], where the index number 0 is in square brackets after the variable name row_1.
![image.png](attachment:image.png)

This is how we can retrieve each element in row_1:
![image.png](attachment:image.png)

Retrieving list elements makes it easier to perform operations. For instance, we can select the ratings for Facebook and Instagram, and find the average or the difference between the two:
![image.png](attachment:image.png)

## Instructions
Below, you can already see the lists for the first three rows.

The fourth element in each list describes the number of ratings an app has received. Retrieve this fourth element from each list, and then find the average value of the retrieved numbers.

1. Assign the fourth element from the list row_1 to a variable named ratings_1. Don't forget that the indexing starts at 0.
2. Assign the fourth element from the list row_2 to a variable named ratings_2.
3. Assign the fourth element from the list row_3 to a variable named ratings_3.
4. Add the three numbers retrieved together and save the sum to a variable named total.
5. Divide the sum (now saved in the variable total) by 3 to get the average number of ratings for the first three rows. Assign the result to a variable named average.

In [2]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

In [30]:
ratings_1 = row_1[3]
ratings_2 = row_2[3]
ratings_3 = row_3[3]
total = ratings_1 + ratings_2 + ratings_3
average = total / 3 
print(average)

2422346.3333333335


### 3. Negative Indexing
In Python, we have two indexing systems for lists:

- __Positive indexing__: the first element has the index number 0, the second element has the index number 1, and so on.
- __Negative indexing__: the last element has the index number -1, the second to last element has the index number -2, and so on.
![image.png](attachment:image.png)

In practice, we almost always use positive indexing to retrieve list elements. Negative indexing is useful when we want to select the last element of a list — especially if the list is long, and we can't tell the length by counting.

![image.png](attachment:image.png)

Notice that if we use an index number that is outside the range of the two indexing systems, we'll get an __IndexError__.

![image.png](attachment:image.png)

## Instructions
The last element in each list shows the average rating of each application.

Retrieve the ratings for the first three rows, and then find the average value of all the ratings retrieved.

1. Assign the last element from the list row_1 to a variable named rating_1. Try to take advantage of negative indexing.
2. Assign the last element from the list row_2 to a variable named rating_2.
3. Assign the last element from the list row_3 to a variable named rating_3.
4. Add the three ratings together and save the sum to a variable named total_rating.
5. Divide the total by 3 to get the average rating. Assign the result to a variable named average_rating.

In [4]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

In [31]:
rating_1 = row_1[-1]
rating_2 = row_2[-1]
rating_3 = row_3[-1]

total_rating = rating_1 + rating_2 + rating_3
average_rating = total_rating / 3
print(average_rating)

4.166666666666667


### 4. Retrieving Multiple List Elements
Oftentimes, we need to retrieve more than one element from a list. Let's say we have the list ['Facebook', 0.0, 'USD', 2974676, 3.5], and we're interested in isolating only the name of the app and the data about ratings (the number of ratings and the rating). This is how we can do that, using what we've learned so far:
![image.png](attachment:image.png)

If we wanted to do this for every app, we'd end up having a lot of variables, making our code lengthy and hard to keep track of. A better solution is to store the data we want in a separate list.
![image.png](attachment:image.png)

Above, we managed to isolate the three list elements of interest in a separate list. To do that, we used the code row_1[0], row_1[3], row_1[-1] to retrieve the first, fourth, and last element, and then we surrounded that part of code with square brackets to create a new list.

## Instructions
1. For Facebook, Instagram, and Pandora — Music & Radio, isolate the rating data in separate lists. Each list should contain the name of the app, the rating count, and the user rating. Don't forget that indexing starts at 0.

 - For Facebook, assign the list to a variable named fb_rating_data.
 - For Instagram, assign the list to a variable named insta_rating_data.
 - For Pandora — Music & Radio, assign the list to a variable named pandora_rating_data.

2. Compute the average user rating for Facebook, Instagram, and Pandora — Music & Radio using the data you stored in fb_rating_data, insta_rating_data, and pandora_rating_data.

 - You'll need to add the ratings together first, and then divide the total by the number of ratings.
 - Assign the result to a variable named avg_rating.
 - As a side note, we could calculate the average rating here a little bit better using the weighted mean — we'll learn about the weighted mean in the statistics courses.

In [32]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]
fb_rating_data = [row_1[0], row_1[3], row_1[-1]]
insta_rating_data = [row_2[0], row_2[3], row_2[4]]
pandora_rating_data = [row_5[0], row_5[3], row_5[4]]


avg_rating = (fb_rating_data[2] + insta_rating_data[2] + pandora_rating_data[2]) / 3
print(avg_rating)

4.0


### 5. List Slicing
In the last exercise, we retrieved the first, fourth, and last list elements to isolate the rating data. We can also retrieve the first three list elements to isolate the pricing data:
![image.png](attachment:image.png)

Instead of selecting element by element, we can use a syntax shortcut:
![image.png](attachment:image.png)

When we select the first n elements (n stands for a number) from a list named a_list, we can use the syntax shortcut a_list[0:n]. In the example above, we needed to select the first three elements from the list row_3, so we used row_3[0:3].

When we selected the first three elements, we sliced a part of the list. For this reason, the process of selecting a part of a list is called __list slicing__.

There are many ways that we might want to slice a list:
![image.png](attachment:image.png)

To retrieve any list slice we want:

1. We first need to identify the first and the last element of the slice.
2. We then need to identify the index numbers of the first and the last element of the slice.
3. Finally we can retrieve the list slice we want by using the syntax a_list[m:n], where:
 - m represents the index number of the first element of the slice; and
 - n represents the index number of the last element of the slice plus one (if the last element has the index number 2, then n will be 3, if the last element has the index number 4, then n will be 5, and so on).
 ![image.png](attachment:image.png)

When we need to select the first or last x elements (x stands for a number), we can use even simpler syntax shortcuts:

 - a_list[:x] when we want to select the first x elements.
 - a_list[-x:] when we want to select the last x elements.
 ![image.png](attachment:image.png)

## Instructions
1. Select the first four elements from row_1 using a list slicing syntax shortcut. Assign the output to a variable named first_4_fb.
2. Select the last three elements from row_1 using a list slicing syntax shortcut. Assign the output to a variable named last_3_fb.
3. From row_5, select the list slice ['USD', 1126879] using a list slicing syntax shortcut. Assign the output to a variable named pandora_3_4.

In [33]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

first_4_fb = row_1[:4]
last_3_fb = row_1[-3:]
pandora_3_4 = row_5[-3:-1]

print(first_4_fb, last_3_fb, pandora_3_4)

['Facebook', 0.0, 'USD', 2974676] ['USD', 2974676, 3.5] ['USD', 1126879]


### 6. List of Lists
Previously, we introduced lists as a better alternative to using one variable per data point. Instead of having a separate variable for each of the five data points 'Facebook', 0.0, 'USD', 2974676, 3.5, we can bundle the data points together into a list, and then store the list in a single variable.

So far, we've been working with a data set having five rows, and we've been storing each row as a list in a separate variable (the variables row_1, row_2, row_3, row_4, and row_5). If we had a data set with 5,000 rows, however, we'd end up with 5,000 variables, which will make our code messy and almost impossible to work with.

To solve this problem, we can store our five variables in a single list:
![image.png](attachment:image.png)

As we can see, data_set is a list that stores five other lists (row_1, row_2, row_3, row_4, and row_5). A list that contains other lists is called a __list of lists__.

The data_set variable is still a list, which means we can retrieve individual list elements and perform list slicing using the syntax we learned. Below, we:

- Retrieve the first list element (row_1) using data_set[0].
- Retrieve the last list element (row_5) using data_set[-1].
- Retrieve the first two list elements (row_1 and row_2) by performing list slicing using data_set[:2].
![image.png](attachment:image.png)

We'll often need to retrieve individual elements from a list that's part of a list of lists — for instance, we may want to retrieve the value 3.5 from ['Facebook', 0.0, 'USD', 2974676, 3.5], which is part of the data_set list of lists. Below, we extract 3.5 from data_set using what we've learned:

- We retrieve row_1 using data_set[0], and assign the result to a variable named fb_row.
- We print fb_row, which outputs ['Facebook', 0.0, 'USD', 2974676, 3.5].
- We retrieve the last element from fb_row using fb_row[-1] (since fb_row is a list), and assign the result to a variable named fb_rating.
- Print fb_rating, which outputs 3.5
![image.png](attachment:image.png)

Above, we retrieved 3.5 in two steps: we first retrieved data_set[0], and then we retrieved fb_row[-1]. However, there's an easier way to retrieve the same value of 3.5 by chaining the two indices ([0] and [-1]) — the code data_set[0][-1] retrieves 3.5:
![image.png](attachment:image.png)

Above, we've seen two ways of retrieving the value 3.5. Both ways lead to the same output (3.5), but the second way involves less typing because it elegantly combines the steps we see in the first case. While you can choose either option, people generally choose the second one.

## Instructions
1. In the code editor, we've already stored the five rows as lists in separate variables. Group together the five lists in a list of lists. Assign the resulting list of lists to a variable named app_data_set.

2. Compute the average rating of the apps by retrieving the right data points from the app_data_set list of lists.
 - The rating is the last element of each row. You'll need to sum up the ratings and then divide by the number of ratings.
 - Assign the result to a variable named avg_rating.

In [9]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

In [34]:
app_data_set = [row_1, row_2, row_3, row_4, row_5]
avg_rating = (app_data_set[0][-1] + app_data_set[1][-1] + app_data_set[2][-1] + app_data_set[3][-1] + app_data_set[4][-1]) / 5
print(avg_rating)

4.2


### 7. Opening a File
The data set we've been working with so far is an extract from a much larger data set:
![image.png](attachment:image.png)

Our best strategy so far was to type each data point and bundle them efficiently into a list of lists. The data set above, however, has 7,197 rows and 16 columns, which amounts to 115,152 (7,197 × 16) data points — typing all that would take us days. We'd also be bound to make typing errors, which will eventually lead to wrong data and false conclusions. Fortunately, we can leverage Python to store this data set as a list of lists in a matter of seconds.

A data set is generally stored as a file in a computer — the data set above is stored as a file named AppleStore.csv. We start by opening the file using the open() command:
![image.png](attachment:image.png)

open('AppleStore.csv') returned the output <\_io.TextIOWrapper name='AppleStore.csv' mode='r' encoding='UTF-8'>. The output is an __object__, which we'll learn more about in the next course. For now, all we have to keep in mind is that the AppleStore.csv file will open once open('AppleStore.csv') has finished running.

Once we've opened the file, we read it in using a command called __reader()__. We import the reader() command from the csv __module__ using the code from csv import reader (a module is a collection of commands and variables — we'll learn more about modules in the next course).
![image.png](attachment:image.png)


Just like open('AppleStore.csv'), reader(opened_file) returned an object. Now that we've read the file, we can transform it into a list of lists using the list() command:
![image.png](attachment:image.png)

The apps_data variable above is a list of lists, and it stores a data set of 7,197 rows and 16 columns. Below, we print only the first five rows of apps_data by using list slicing (and color each individual row differently to help you read the output easier):
![image.png](attachment:image.png)

Although there are 7,197 rows (apps) in our data set, len(apps_data) indicates there are 7,198 rows because it also considers the header row, which describes the column names (the first row, colored above in orange).
![image.png](attachment:image.png)

As a side note, the AppleStore.csv file is currently located on our servers. Later on in this course, we'll help you set up your own environment __locally__ — you'll be able to run Python code and open the AppleStore.csv on your own local computer (you're currently running code and opening the file in a browser).

## Instructions
Open the AppleStore.csv file and store it as list of lists.

1. Open the file using the open() command. Save the output to a variable named opened_file.
2. Read in the opened file using the reader() command (we've already imported reader() for you from the csv module). Save the output to a variable named read_file.
3. Transform the read-in file to a list of lists using the list() command. Save the list of lists to a variable named apps_data.
4. Explore apps_data. You could:
 - Print its length using the len() command
 - Print the first row (the row describing column names)
 - Print the second and the third row (try to use list slicing here)

In [16]:
from csv import reader
opened_file = open('D:/My/Learning/23--Data Scientist with Python Track-DataQuest/AppleStore.csv',encoding="utf-8")
read_file = reader(opened_file)
apps_data = list(read_file)
print(len(apps_data))
print(apps_data[0])
print(apps_data[2:4])

7198
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
[['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'], ['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']]


### 8. Repetitive Processes
Previously in this mission, we were interested in computing the average rating of an app. This was a doable task when we were working with only five rows, but our data set now has 7,197 rows. Our best strategy was to:

1. Retrieve each individual rating.
2. Sum up the ratings.
3. Divide by the number of ratings.
![image.png](attachment:image.png)

Retrieving 7,197 ratings manually is impractical because it can take a long, long time. We need to find a way to retrieve all 7,197 ratings in a matter of seconds.

Looking at the code example above, we see that a process keeps repeating: we select the last list element for each list within app_data_set. The app_data_set stores five lists, so we repeat the same process five times. What if we could tell Python directly that we want to repeat this process for each list in app_data_set?

Fortunately, we can do that — Python offers us an easy way to repeat a process, which helps us enormously when we need to repeat a process hundreds, thousands, or even millions of times.

Let's say we have a list [3, 5, 1, 2] assigned to a variable ratings, and we want to repeat the following process: for each element in ratings, print that element. This is how we could translate that into Python syntax:
![image.png](attachment:image.png)

In our first example above, the process we wanted to repeat was "extract the last element for each list in app_data_set". This is how we can translate that process into Python syntax:
![image.png](attachment:image.png)


Let's try to get a better understanding of what happens above. Python isolates, one at a time, each list element from app_data_set, and assigns it to each_list (which basically becomes a variable that stores a list — we'll discuss this more on the next screen):
![image.png](attachment:image.png)

The code in the last diagram above is a much more simplified and abstracted version of the code below:
![image.png](attachment:image.png)

Using the technique above requires us to write a line of code for every row in the data set. But using the for each_list in app_data_set technique requires us to write only two lines of code regardless of the number of rows in the data set — the data set can have five rows or one million.

Our intermediate goal is to use this new technique to compute the average rating for our five rows above, and our final goal is to compute the average rating for our data set with 7,197 rows. We'll do exactly that over the next few screens of this mission, but for now, we'll focus on practicing this technique to get a good grasp of it.

Before writing any code, we need to indent the code we want repeated four space characters to the right:
![image.png](attachment:image.png)

Technically, we only need to indent the code at least one space character to the right, but the convention in the Python community is to use four space characters. This helps with readability — it will be easier for other people who follow this convention to read your code, and it will be easier for you to read theirs.

## Instructions
1. Use the new technique we've learned to print all the rows in the app_data_set list of lists.
 - Essentially, you'll need to translate this pattern into Python syntax: for each list in the app_data_set variable, print that list.
 - Don't forget about indentation.

In [17]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]
for each_list in app_data_set:
    print(each_list)

['Facebook', 0.0, 'USD', 2974676, 3.5]
['Instagram', 0.0, 'USD', 2161558, 4.5]
['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
['Temple Run', 0.0, 'USD', 1724546, 4.5]
['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]


### 9. For Loops
The technique we've just learned is called a __loop__. Because we always start with for (like in for some_variable in some_list:), this technique is known as a __for loop__.

These are the structural parts of a __for loop__:
![image.png](attachment:image.png)


The indented code in the body gets executed the same number of times as elements in the iterable variable. If the iterable variable is a list that has three elements, the indented code in the body gets executed three times. We call each code execution an iteration, so there'll be three iterations for a list that has three elements. For each iteration, the iteration variable will take a different value, following this pattern:

 - For the first iteration, the value is the first element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 1).

 - For the second iteration, the value is the second element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 3).

 - For the third iteration, the value is the third element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 5).
 ![image.png](attachment:image.png)

The code outside the loop body can interact with the code inside the loop body. For instance, in the code below we:

Initialize a variable a_sum with a value of zero outside the loop body.
We __loop__ (or __iterate__) over a_list. For every iteration of the loop, we:
- Perform an addition (inside the loop body) between the current value of the iteration variable value and the current value stored in a_sum (a_sum was defined outside the loop body).
- Assign the result of the addition back to a_sum (inside the loop body).
- Print the value of the a_sum variable (inside the loop body). Notice that the value of a_sum changes after each addition. At the end of the loop, a_sum has the value 9, which is equivalent to the sum of the numbers in a_list (1 + 3 + 5).
![image.png](attachment:image.png)

Above, we created a way to sum up the numbers in a list. We can use this technique to sum up the ratings in our data sets. Once we have the sum, we only need to divide by the number of ratings to get the average value. Let's begin with computing the average rating value for the data set with five rows.

## Instructions
Compute the average app rating for the apps stored in the app_data_set variable.

1. Initialize a variable named rating_sum with a value of zero outside the loop body.
2. Loop (iterate) over the app_data_set list of lists. For each of the five iterations of the loop (for each row in app_data_set):
3. Extract the rating of the app and store it to a variable named rating. The rating is the last element of each row.
4. Add the value stored in rating to the current value of the rating_sum.
5. Outside the loop body, divide the rating sum (stored in rating_sum) by the number of ratings to get an average value. Store the result in a variable named avg_rating.

In [35]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]

rating_sum = 0
for each_row in app_data_set:
    rating = each_row[-1]
    rating_sum = rating_sum + rating

avg_rating = rating_sum / 5
print(avg_rating)

4.2


### 10. The Average App Rating
Now we move on to computing the average rating for the data set that has 7,197 rows. Remember we first need to open the file AppleStore.csv and transform it into a list of lists:
![image.png](attachment:image.png)

If we use the technique we learned and loop over apps_data to get the rating sum, we'll get a __TypeError__:
![image.png](attachment:image.png)


This error happens because the first row of apps_data doesn't contain numbers (it describes column names). In the loop body, we assign the value of row[7] to the rating variable, and then we add rating to rating_sum. But for the first iteration of the loop, row[7] takes the string value 'user_rating' (which is a column name). This means that running rating_sum + rating is equivalent to 0 + 'user_rating', which causes a TypeError because strings and integers cannot be added together.

Theoretically, we'd have two solutions:

We remove the first row from apps_data, and then we start over the iteration. We do that by:
Saving the header row to a separate variable named header
Saving apps_data[1:] back to apps_data — apps_data[1:] is a list slice that excludes the first row (the header row)
We iterate directly over apps_data[1:], which is a list slice that excludes the first row.
![image.png](attachment:image.png)

For some reason, we got the same error. Upon inspecting some of the rows in apps_data, we see that all the values are surrounded by quotation marks, which suggests they are strings. Once again, the error is caused by trying to add a string to an integer.
![image.png](attachment:image.png)

In the previous mission, we learned to convert strings to integers or floats (decimal numbers) using the int() and float() commands. The ratings are expressed as decimal points, so we'll convert them to floats using the float() command.
![image.png](attachment:image.png)

## Instructions
Compute the average app rating for all the 7,197 apps stored in the data set.

1. Initialize a variable named rating_sum with a value of zero.
2. Loop through the apps_data[1:] list of lists (make sure you don't include the header row). For each of the 7,197 iterations of the loop (for each row in apps_data[1:]):
 - Extract the rating of the app and store it to a variable named rating (the rating has the index number 7). Make sure you convert the rating value from a string to a float using the float() command.
 - Add the value stored in rating to the current value of the rating_sum.
3. Divide the rating sum (stored in rating_sum) by the number of ratings to get an average value. Store the result in a variable named avg_rating.

In [36]:
opened_file = open('AppleStore.csv', encoding='utf-8')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

rating_sum = 0
for row in apps_data[1:]:
    rating = float(row[7])
    rating_sum += rating
avg_rating = rating_sum / (len(apps_data[1:]))
print(avg_rating)

460.3739057940809


# 11. Alternate Way to Compute an Average
Now we'll learn an alternative way to compute the average rating value. Once we create a list, we can add (or append) values to it using the append() command.

![image.png](attachment:image.png)

Unlike other commands we've learned, notice that append() has a special syntactical usage, following the pattern list_name.append() rather than being simply used as append() (we'll get a better understanding of this syntactical quirk once we learn about functions and methods).

Now that we know how to append values to a list, we can take the steps below to compute the average app rating:

1. We initialize an empty list.
2. We start looping over our data set and extract the ratings.
3. We append the ratings to the empty list we created at step one.
4. Once we have all the ratings, we:
 - use the sum() command to sum up all the ratings (to be able to use sum(), we'll need to store the ratings as floats or integers); and then
 - we divide the sum by the number of ratings (which we can get using the len() command).
Below, we can see the steps above implemented for our data set with five rows:
![image.png](attachment:image.png)

## Instructions
Using the new technique we've learned, compute the average app rating for all of the 7,197 apps stored in our data set.

1. Initialize an empty list named all_ratings.
2. Loop through the apps_data[1:] list of lists (make sure you don't include the header row). For each of the 7,197 iterations of the loop:
 - Extract the rating of the app and store it to a variable named rating (the rating has the index number 7). Make sure you convert the rating value from a string to a float.
 - Append the value stored in rating to the list all_ratings.
3. Compute the sum of all ratings using the sum() command.
4. Divide the sum of all ratings by the number of ratings, and assign the result to a variable named avg_rating.

In [41]:
opened_file = open('AppleStore.csv',encoding='utf-8')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

all_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    all_ratings.append(rating)
avg_rating = sum(all_ratings) / len(all_ratings)
avg_rating

460.3739057940809

### 12. Next Steps
In this mission, we learned:

 - How to read in a large data set as a list of lists
 - How to use lists and for loops to analyze a large data set
We've made some nice, steady progress so far in the course and come a long way — from learning to perform a simple addition like 1 + 3 to analyzing a data set of 7,197 rows using for loops and lists of lists.

In the next mission, we'll build on what we already know and learn new techniques that will enable us to do more complex data analysis.

## Mission 4/9: Conditional Statements