```
BEGIN ASSIGNMENT
requirements: requirements.txt
solutions_pdf: true
export_cell:
    instructions: "These are some submission instructions."
generate: 
    pdf: true
    zips: false
export_cell:
    pdf: false
    instructions: "Please submit the resultant .zip file to the SciTeens platform"
```

# Lesson One: Introduction to Data Science

Welcome, and thanks for joining us! In this four-part course, you will learn the basic concepts of one of the world's most popular coding languages (i.e. Python) and dive into the field of Data Science. 

As we teach, please make sure to ask plenty of questions on anything you're confused about and never hesitate to let us know if we're moving too quickly!

To get started, 
- If you've never used a Jupyter Notebook (i.e. the file you are currently on), make sure to try this 5-minute tutorial. 
- We _highly_ encourage you to experiment with the code and datasets in these files.  If you break anything, close the tab and open the activity again to start over.

By the end of this course, you will have two main things. The first is a toolkit that is essential in _any_ field (e.g. science, business, robotics, etc.) you choose to explore. The second is a jumping off point in any research you do decide to do. 

Let's get started by learning about Data Science!

![Data](https://media.giphy.com/media/hOzfvZynn9AK4/giphy.gif)

### What is Data Science? 

To understand Data science, we must know these two definitions: 
- **Data**: pieces of information 
- **Dataset**: allows you to store data in an organized way 

Let's say we were to give you this piece of data:
<p align="center">AK,2015,Olivia,56 </p>

For it to be useful, we must label it. Therefore, let's label "AK" as a state, "2015" as a year, "Olivia" as a name, and "56" as number of babies. Given the data with labels, we can deduce that in 2015 there were 56 babies called Olivia in Arkansas. If we modify any label, it makes a drastic difference in our interpretation of the data. For example, by changing "56" to be the number of dogs instead of babies, we conclude that in 2015 there were 56 dogs named Olivia in Arkansas which is different from our original conclusion above. 

With a large amount of data, we can start to find meaningful patterns called **trends**. These simply describes the general direction of the data. Some examples include: 
 
- X is increasing.
- Y is moving upward.
- Z is the same. 
- R is changing at a faster rate than Y.
    
A more relevent example is given a dataset containing the names of people diagnosed with COVID-19 in the USA and the month they were, we can find if the number of cases in the USA is increasing, decreasing, or the same. 
  
The people who find these trends are those who study Data Science. It's defined as: 


**The application of data centric, computational,and inferential thinking to:**

1. Understand the world (science)
<br>
2. Solve problems (engineering)

In simple terms, it is the study of data with the goal of extracting useful information about it. 

To achieve this goal, data scientists combine aspects of statistics, computer science, and mathematics. However, we'll soon see that just because Data Science has these fields at its core it doesn't mean that it's limited to them. In reality, Data Science has a role in virtually any STEM (i.e. Science Technology Engineering Mathematics) field you can imagine. In this course, we'll be applying data science to the field of Ecology.



## 🐍 Section One: Python

Just like in this world we have different languages like Spanish, English, Italian, and so forth, computers have many languages as well. There are over 500 hundred computer languages. Some of these languages are called Java, C#, Python, and so forth. 

In this class, we will be using Python to analyze data. By using this language, your computer can understand it the code you write.

When you write a piece of code, you are writing what's known as an **algorithm**. This is a step-by-step set of instructions which tell the computer what to do. It is important to know that the computer will _only_ do what you tell it to and will not infer any step even if it's obvious to you. Therefore, you must be specific at every step. If you aren't, your program will not work. By writing an algorithm in a language your computer understands (e.g. Python), you can build anything!

### Primitive Types

But, before we start learning a language, we must know the basic building blocks of it. In a computer language, such as Python, these blocks are called **primitive types**. These primitive types are the basis of computer languages just like letters are to English.

In Python, these are the primitive types: 

- integers : 1, 2, 3, 4, 5 ...
- float / double : : 1.2, 1.453, 0.33333333 ... 
- string: "data", "science", "Python" ...
- boolean: True, False


### Operators

To analyze data, we might do some sort of math computation. In a computer language, the multiplication, division, addition, and subtraction symbols are known as **operators**.

In Python, these are the operators: 

- \+ : add
- \- : subtract
- \* : multiply
- / : divide

You can see how to use them below!

Note: In order to run our code, we have two options. You can either click the play button to the left of the block to run the code in Google Colab, or you can click on the block and press the keys shift + enter.


In [None]:
3 * 5

In [None]:
20 / 5

In [None]:
Now, try this on your own!

See the `#` symbol below? This isn't us using hashtags within our code, but instead, it's used for **commenting**. Comments allow us to leave messages in our code, which won't be run when we run the rest of the code. Essentially, Python / Jupyter will ignore any line of code that has a `#` before it. 

In [None]:
# TODO: Add six and seven


In [None]:
# TODO: Multiply two by twenty-four


In [None]:
# TODO: Divide six by four 


In [None]:
# TODO: Subtract twenty from seventeen


The last two types of operators in Python that we'll learn are. They are super useful for dealing with situations in which we want to compare values: 

- == : used to check equality 
- & : represents and (we'll talk more about this soon)

Just a quick note, the equality operator is two equal signs in a row because it is different than the **assignment** `=` operator that we'll show you shortly.

When you want to see if two things are equal to each other, you use the equality operator. This tells you if two values are the same. Let's check these out in action!

In [None]:
'Hello' == 'Hello'

In [None]:
6 == 6

In [None]:
'Good Morning' == 'Good Night'

Simple enough! Now, let's look at the and operator. This operator only works with boolean values and is thus a **logical operator**. Recall that a boolean is a primitive type representing True or False. You can think of the and operator to function like a middleman, who only cooperates when both sides are willing to cooperate. If both sides cooperate (if both sides stay *True* to their word), then the and operator returns a True value. In any other case, the and operator returns a False. Let's check out how it works for all cases: 

In [None]:
True & True

In [None]:
False & True

In [None]:
True & False

In [None]:
False & False

As we can see, the only time the and operator results in True is when both values are True. We'll work with this more when it comes time to process and examine our data. Oh, and just so you know, there are more logical operators out there besides the and operator. We won't cover them so that you don't feel so overwhelmed with information, but if you want to check them out alongside other Python operators, be sure to check out [this tutorial](https://www.w3schools.com/python/python_operators.asp)

## 🖥️ Section Two: Variables
Operators weren't all that shabby! Next, let's try creating a **variable**. A variable stores a value in a way which we can access it later. For example, if I set the variable *Name* to "Brandon", we can use the **assignment** `=` operator 

In [None]:
Name = "Brandon"

I can then call *Name* later to access the value stored there

In [None]:
Name

Try it yourself by setting the variable *seven* to the number seven.

In [None]:
# TODO: Sset 'seven' to the number seven
seven 

Great work! One of the last important concepts that we'll learn about today are **lists**. Lists allow us to store multiple values at once. Think of it like a grocery list; if we want to remember to buy Bananas, Captain Crunch, and Pasta, we could write it down as
```
Bananas,
Captain Crunch,
Pasta
```
We can achieve the same in Python!

In [None]:
groceries = ['Bananas', 'Captain Crunch', 'Pasta']

Since we saved our list in a variable called *groceries*, we can check them out by running 

In [None]:
groceries

How about you try making your own shopping list similar to the one above?

In [None]:
# TODO: Create a shopping list with at least four items
shopping_list = []

Now, let's check out your shopping list!

In [None]:
shopping_list

Pretty sweet! Now this isn't to say that you should keep grocery lists as python lists from now on; I doubt you want to haul your computer to the store just to buy three things. But lists are one of the most important **collections** within Python when it comes to data science. Now let's see some of the cool things we can do with lists

## 📜 Section Three: Lists, Continued

![List](https://media.giphy.com/media/F0QWePzwQRewM/giphy.gif)

Since we're storing multiple values within one variable, it's important that we're able to access the individual values from the list. To get certain values, we can use **indexing**. Brace yourselves, because here comes a pretty tricky concept to remember. Python uses **Zero-based indexing**, which means that it starts counting from zero instead of from one. For example, to access the first item from the groceries list, we would use

In [None]:
groceries[0]

This is pretty weird to understand at first, but it'll make more sense as you go on and practice. Try it out yourself by grabbing the third item from the grocery list

In [None]:
# TODO: Get the third item from the groceries list
groceries[ ]

Nice job. It's cool to be able to grab a single value from a list at a time, but what if I want to get multiple? This can be achieved using a colon when selecting a value. If we wanted to fetch the first and second value from the groceries list, we could use 

In [None]:
groceries[0:2]

Now, that may seem off at first: why would we specify ```0:2``` if we want ther first and the second values? Python slicing with a colon will always grab the first value that you specify, up to (and not including) the last value that you specify. For example, if we wanted to get the second and third values, we could use 

In [None]:
groceries[1:3]

Try it yourself with your own shopping list! try getting the second and third values from your list below. 

In [None]:
# TODO: Get the second and third values from your shopping list
shopping_list[]

Great work! Once last thing that we'll learn is shorthand notation for list indexing. That's a whole lot of fancy words to say getting ranges of values more easily from our list. This is better shown than explained, so here's some examples of how to achieve certain tasks below. For starters, we'll select all values up to the third value in the groceries list. 

In [None]:
groceries[:2]

Notice that we don't have to put a zero in front of the colon to achieve this; Python is smart and knows to start at zero. What if we wanted to select all values including and after the second value? 

In [None]:
groceries[1:]

Sick! Now, try it yourself with your shopping list.

In [None]:
# TODO: Select all values up to the fourth value in your shopping list 
shopping_list[]

In [None]:
# TODO: Select all values after the first value in your shopping list
shopping_list[]

## ➰ Section Four: Looping through Iterables

![Loop](https://media.giphy.com/media/TabwFck9vEt44/giphy.gif)

The last thing we'll learn about today are **iterables** and **loops**. Conveniently, we've already learned about one of the most common iterables in Python: a list! An iterable is simply something that can be iterated over. Wow, that was an awful definition. Here's a better one: an iterable is something that we can loop through to check out each of the values within it. We can achieve this with a **for loop**. Say we wanted to print out each item in the grocery list individually. We could do so as below

In [None]:
for grocery in groceries:
    print(grocery)

Notice how when we make the for loop, we created a new variable *grocery* that stores the value for a particular grocery within our groceries list. This variable can have any name you want it to, like this:

In [None]:
for pet in groceries:
    print(pet)

That doesn't make much sense, but it nonetheless works. Moral of the story: stick to naming your variables in ways that best describe your data. Now let's make a list of numbers from one to four

In [None]:
numbers = [1,2,3,4]

Let's try adding the number two to each number in the list, and then printing that sum. We can do that using the code below

In [None]:
for number in numbers:
    print(number + 2)

It's lit! Try out for loops yourself with the code below

In [None]:
## TODO: Multiply each value in the numbers list by four, and then print the number


In [None]:
## TODO: Raise each number in the numbers list to the third power, and then print the number


In [None]:
## CHALLENGE: Print each item in the groceries list, as well as " is delicious"


## ✏️ Practice
That's it for this week's lesson! Now, to make sure that you understand what we learned today, be sure to complete the below notebooks. 

First thing's first, we can use order of operations within Jupyter Notebook environments. Remember PEMDAS? It works like a charm in Jupyter and Python! Say we wanted to calculate 
$$(2 * 5 + 3)^2 $$
We could do so with

In [None]:
(2 * 5 + 3)**2

### Question One
Calculate $ (\frac{1}{2} - 2 * 8)^5 $, and set it to the variable `solution_1`
```
BEGIN QUESTION
name: q1
points: 2
```

In [2]:
solution_1 = ...
solution_1 = (0.5 - 2 * 8) ** 5 # SOLUTION

In [5]:
# TEST
isinstance(solution_1, float)

True

In [7]:
# TEST
abs(solution_1 - -894660.96875) < 0.0000001

True

Good job. Let's move onto strings and variables now. If we want to **concatenate**, or combine two strings, we can simply add them! For example, if we were to create "Hello World" from "Hello" and "World", we could do

In [None]:
print("Hello" + " " + "World")

Make sure to note that I added a " " so that we didn't end up with "HelloWorld". Now,try it out yourself. 

### Question Two
You have four variables (I, Love, Data, and Science). Each represents a string. Combine them all into a phrase, assigning that phrase to the variable `phrase`, and then printing out that phrase. 
```
BEGIN QUESTION
name: q2
points: 2
```

In [8]:
I = "I"
Love = "Love"
Data = "Data" 
Science = "Science" 
phrase = ...
phrase = I + " " + Love + " " + Data + " " + Science

In [9]:
# TEST 
isinstance(phrase, str)

True

In [12]:
# TEST 
phrase == 'I Love Data Science'

True

### Question Three
Create a string that says "Hello", repeated three times ("HelloHelloHello"), and assign it to the variable `hello`.  **TIP:** You can also multipy strings by numbers in Python
```
BEGIN QUESTION
name: q3
points: 2
```

In [17]:
hello = ...
hello = 'Hello' * 3 # SOLUTION

In [18]:
# TEST
isinstance(hello, str)

True

In [19]:
# TEST
hello == 'HelloHelloHello'

True

Now to practice working with lists. Below is some fancy code for creating a super long list. With this list, we'll practice indexing

In [8]:
super_long_list = [i for i in range(123457)]

### Question Four
Get the requested values from the list
```
BEGIN QUESTION
name: q4
points: 4
```

In [2]:
# TODO: Get the second value from the list, and set it to the variable `second`
second = ...
second = super_long_list[1] # SOLUTION

In [3]:
# TODO: Get the first five values from the list, and set it to the variable `first_five`
first_five = ...
first_five = super_long_list[:5] # SOLUTION

In [4]:
# TODO: Get the last value from the list, and set it to the variable `last`
last = ...
last = super_long_list[-1] # SOLUTION

In [5]:
# TODO: Get the last fourteen values from the list, and set it to the variable `last_fourteen`
last_fourteen = ...
last_fourteen = super_long_list[-14:] # SOLUTION

In [6]:
# HIDDEN TEST 
second == 1

True

In [7]:
# HIDDEN TEST 
first_five == [0, 1, 2, 3, 4]

True

In [8]:
# HIDDEN TEST 
last == 123456

True

In [9]:
# HIDDEN TEST 
last_fourteen == [123443,
 123444,
 123445,
 123446,
 123447,
 123448,
 123449,
 123450,
 123451,
 123452,
 123453,
 123454,
 123455,
 123456]

True

You're going ham! Last but not least, we'll practice working with iterables and for loops. For this, we'll stick to working with the super long list we created in the last problem.

### Question Five
Print the last twenty values from the list, using a for loop.
```
BEGIN QUESTION
name: q5
points: 2
```

In [None]:
# TODO: Print the last twenty values from the list


### Challenge Question
Now, time for our last question today. This one will be pretty hard, so major props to you if you can figure it out. This question is multistep, and it will prove that you've mastered the content we learned today. Just remember: it's not cheating if you Google how to achieve some of the steps in this question.

First, get the values 100 through 150 from the super long list and store them in a new list.  

Then, square each of these values and then divide that value by two. After you've calculated this new value, add it to the list called *cubed_over_two*. To achieve this, you'll likely have to use a for loop, as well as the list **method** `.append()` (more on methods next lesson.) `.append()` works like this: if we have a list such as 
`my_list = [1,2,3,4]`, and we run `my_list.append(5)`, *my_list* is now `[1,2,3,4,5]`

Finally, set the last three numbers from this new array to the variable `last_three`. Best of luck!
```
BEGIN QUESTION
name: challenge
points: 4
```

In [9]:
# TODO: Get values 100 through 150
values = ...
values = super_long_list[99:150] # SOLUTION
cubed_over_two = []

# TODO: Square each value and divide it by two
# BEGIN SOLUTION
for i in values:
    cubed_over_two.append(i ** 2)
# END SOLUTION

# TODO: Print the last three numbers from the array
last_three = ...
last_three = cubed_over_two[-3:]

In [12]:
# TEST 
values == [99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149]


True

In [14]:
# TEST 
last_three == [21609, 21904, 22201]

True