![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fhackathon&branch=master&subPath=ColonizingMars/ChallengeTemplates/challenge-option-1-should-we-colonize-Mars.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

In [None]:
# Run this cell
import interactive as i
from IPython.display import IFrame

# Best Friends in Space: Which pets should we bring to Mars? 

As one of the world's leading experts on cute animals, NASA has asked you to help decide which pets to bring onto Mars, since it might get lonely out there. In this adventure, you will learn datascience skills  and solve several challenges that will help you reach your goal. 

If you're stuck at any point, you can use [this link](https://hub.callysto.ca/jupyter/user/fbde8dcc92b359bd09845d32c52390da68637752/tree/hackathon/ColonizingMars/Tutorials) to access some tutorial notebooks. You can also check out [this](https://www.youtube.com/playlist?list=PL-j7ku2URmjZ1F3-9jvBuvsf0KcWPsxab) video series that can help you cover some basics. **Remember, to "run" a notebook cell, we can either use the button above or press `shift + enter`.** Make sure you do this in every cell that starts with `# Run this cell` and cells you write code in. If it doesn't work, try hitting the stop button (square button on the top bar) and try again.

To begin, NASA has assigned you a helper name **Rover**. Rover will provide hints along the way and help you when you get stuck. Remember that you can save this notebook by pressing `ctrl (or cmd on mac) + s` or going to `File -> Save and Checkpoint` in the toolbar. For more information see [notebook basics.](./notebook-basics.ipynb)


In [None]:
# Run this cell
i.userinfo()

## Challenge #1: Accessing the Pet Archives

In this challenge you are tasked with using computer commands to access the Pet Archives Data which was collected by Bootstrap World. 

**Rover**: *ruff!* The first thing we should do is tell our notebook that we want to access the group of computer commands that lets us work with data. I think it was called `pandas`..?

In [None]:
# Run this cell
display(IFrame('https://www.youtube.com/embed/P6KNkb7xVjk', width=560, height=315)) 

In [None]:
# Run this cell
i.challenge1a()

### Debrief: Functions, Libraries, and Filenames
Now that we have access to the special commands we need to access the data, let's use them! The data file they gave us is called `pets_from_bootstrap_world.csv` - the .csv at the end tells us it is a csv file. The CSV stands for "Comma Separated Values", there is a special function in the `pandas` library we can use to view this data file called `read_csv()`. 

Python is a little tricky though, and needs us to put in the code which library the function we want to use comes from. For example, if I wanted to use the `loadtxt()` function from the `numpy` library, I would have to write it like `numpy.loadtxt('file.txt')`. See how the library is first, then there is a `.`, followed by the function? We then use the function on the file by putting the filename in the brackets. The file name is surrounded by `"` `"` so that it is read as characters. 

In [None]:
# Run this cell
i.challenge1b()

### Debrief: How to Avoid Writing Long Code Over and Over Again

Now that we've learned how to access the data, we need to make a codeword for it to make it easier for the computer to recognize it later. In this process, we end up making what's called a variable. Think about it this way, depending on what you put on the right side of the `=` sign, the codeword's value may vary. The codeword is called a variable name, when you give it a value you can use it to mean that value in other parts of the notebook. Let's say I ran `a = 2` in a cell, now whenever I type `a` it will mean the same as 2.

In [None]:
# Run this cell
i.challenge1c()

### Challenge Complete!
Wow, great job! You've completed the very first challenge with flying colours. Now, before moving onto the next challenge, write down the age of the pets in rows `28` (Miaulis), `15` (Gir), and `11` (Maple). **Make sure you write it somewhere that's safe, because you never know when it will come in handy! ;)**

## Challenge #2: Making Data Easy

Now that we have successfully accessed the data archives from our friends back on Earth, it's now time to explore what pets there are. This will help us get an idea of not only what our options are, but which pets can be most suitable for life on Mars. 

**Rover**: *I hope it's dogs!*

In [None]:
# Run this cell
display(IFrame('https://www.youtube.com/embed/gCcyJdQyvjo', width=560, height=315)) 


### Debrief: Heads and Tails
You might notice when you view the dataset, it's the *whole* dataset. Say we wanted to only see the first few or the last few rows of the data. To do this we can use `dataset.head()` to see the first 5 rows or `dataset.tail()` to see the last 5 rows. Try it out below! :) 

In [None]:
# Run this cell
i.challenge2a()

### Debrief: Selecting Data Columns

With the data now easily accessible, let's try using some cool functions from the `pandas` library to get to know our data more. You can also check out the Tutorial's [pandas cheatsheet](./pandas-dataframes.ipynb) for a whole list of things we can do.

For example, to access the values in a column, we can use `dataset["column_name"]`. In Python `""` and `''` are the same. You can also select multiple columns like this: `dataset[["column_name1", "column_name2"]]`. Note the double `[[ ]]` will give you a dataframe, which is most of the time what we want, versus `[]` which will give you only the single column.

To list all the columns in a dataset, use `dataset.columns` or `list(dataset.columns)` for a list instead of an index.

In [7]:
# Run this cell
i.challenge2b()

First, we will need to know what columns are in the data. We can do this by either looking at the dataset again, or use [1;32mlist(dataset_variable_name.columns)[1;0m
 Try filling in the command: 
 list(pets.columns)
[1;35mYou rock! [1;0m


['Name',
 'Species',
 'Gender',
 'Age (years)',
 'Fixed',
 'Legs',
 'Weight (lbs)',
 'Time to Adoption (weeks)']

In [8]:
# Run this cell
i.challenge2c()

Now that we have all the column names, let's choose a couple to look at. How about the [1;32m'Name'[1;0m column?
 [1;36mSelect the column using [1;32mdataset_variable_name["column_name"][1;0m.
pets['Name']
[1;35mGlad we have you around! [1;0m


0                Sasha
1              Mittens
2            Sunflower
3                Sheba
4                Felix
5             Snowcone
6                 Wade
7             Hercules
8               Toggle
9              Boo-boo
10               Fritz
11               Maple
12                  Bo
13            Midnight
14                 Rex
15                 Gir
16                 Max
17                Nori
18    Mr. Peanutbutter
19               Lucky
20                Kujo
21               Buddy
22                Gila
23            Snuffles
24             Nibblet
25            Snuggles
26               Daisy
27                 Ada
28             Miaulis
29          Heathcliff
30             Tinkles
Name: Name, dtype: object

### Challenge Complete!

Nice work! For future reference, write down the first letter of each column name in the order they appear in the output. **This information will be useful later on! :)**

## Challenge #3: Narrowing Down the Data

You've done great getting to this point. Now that we are getting more comfortable with data science, let's work on our skills at narrowing down the data. This will help us narrow down *which* pets we should consider based on different criteria. This part might get a little tough, so make sure you ask for help if you get stuck!

### Debrief: Getting Unique Entries in a Column

**Unique Values `.unique()`**: Sometimes instead of viewing all the values in a column, we just want to see the different types of values instead. For example, in this dataset it can be useful to know what types of animals there are without knowing all of them. We can do this by writing `dataset.["column_name"].unique()`. Note the `.unique()` is at the end. 

In [None]:
# Run this cell
i.challenge3a()

### Debrief: Narrowing Down Data Based on Criteria

**Basic Operators**: In computer programming languages, we can do a lot of things using basic math alone. With data, we can use symbols like `>, <, ==, %, /, *` and others to do cool things. For instance, if we wanted to select all entries in a dataset with a value above 5, we could do `dataset["column_name"] >= 5`. This will give us the all of the matching data entries who's value in the column is greater or equal to 5. We can also use words as long as they are in quotes (case-sensitive), such as `dataset["column_name"] == "searchterm"`. The `==` means that it has to be exactly that.

**Logical Operators**: To use more than one we need to use a logical operators. For example, `&` means and, `|` (vertical bar) means or, and `~` means not. Therefore we can do something like: `dataset["column_name"] >= 5 & dataset["column_name"] < 10` or even: `dataset["column_name"] >= 5 | dataset["column_name"] < 2 ` 

*Learn more about operators [here](https://www.w3schools.com/python/python_operators.asp).*

In [None]:
# Run this cell
i.challenge3b()

### Debrief: Locating a Data Entry

**Label-based Location `.loc`**: You may notice that when doing the above you only get `True/False` values. We can get the actual rows of data by using by using `dataset.loc[label]`. Basic operators can act as labels. For example, I can use something like: 
`dataset.loc[dataset["column_name"] >= 5]` to get all the rows with a value of greater than 5 for a given column. You can also have multiple operators, such as `.loc[basic_operator1 & basic_operator2]`.

In [None]:
# Run this cell
i.challenge3c()

### Challenge Complete!

What is the name of the three legged pet? **Write it down in a safe place for later!**


## Challenge #4: Choosing Our Pets



### Debrief: Sorting and Subsetting

**Sorting**: We can sort values based on alphabetical or numerical order. To do this we just use `dataset.sort_values(by ='column_name')` or `dataset.sort_values(by ='column_name', ascending = False)` if we want it to be listed in reverse order. Because there are lots of ways to sort, we need to choose a specific column in the `by = 'column_name'` part. 

**Subsetting**: We can also create new datasets by picking the columns and rows that we want. 

To create a new dataset that only has the columns we want we can write:
`custom_dataset = dataset["column_name", "column_name2"]`. 

We can also create a new dataset that only has the rows we want we can write:
`custom_dataset = dataset.iloc[row1:row2]` <- Note that for this we use `.iloc` instead of `.loc`
To do both: `pets.loc[rows1:row2,'column_name']` 

In [9]:
pets_over_5lbs = pets['Weight (lbs)'] >= 5
pets.loc[(pets['Weight (lbs)'] > 100) & (pets["Species"] == "dog")]

NameError: name 'pets' is not defined

### Debrief: Getting Basic Statistics

There are lots of cool ways we can explore our data. But the most straightforward ones are 

`dataset['column_name'].max()`,
`dataset['column_name'].min()`,
`dataset['column_name'].mean()`
and 
`dataset.summary()` or `dataset['column_name'].summary()`

In [None]:
# Sort time to adoption, age, etc. 
#========= PLACE ANSWER BELOW ===========#



In [None]:
def challenge3b():
    sdlfs

In [None]:
# subset species, gender, fixed, etc and assign to variable name
#========= PLACE ANSWER BELOW ===========#



In [None]:
## code for generate and append example

#pets = pandas.read_csv("pets_from_bootstrap_world.csv")
pets_list = pets["Species"].unique()
emoji_list = ["🐈", "🐕", "🦎", "🐇", "🕷"]
for i in range(len(emoji_list)):
    pets.loc[pets["Species"] == pets_list[i], "Emoji"] = emoji_list[i]
    
pets

## Challenge #5: Visualization

Now that we have successfully accessed the data archives from our frien

In [None]:
# Histogram of pet species, also include male/female split
#========= PLACE ANSWER BELOW ===========#




In [None]:
# Graph the ages of the species using groupby() 
#========= PLACE ANSWER BELOW ===========#

plt.figure(figsize=(16,8))
columns = ["duration_ms"]
for col in columns:
    x = music.groupby("year")[col].mean()
    x = x/(1000*60)
    ax= sns.lineplot(x=x.index,y=x,label=col)
    
    
ax.set_title('Audio characteristics over years')
ax.set_ylabel('Duration in minutes')
ax.set_xlabel('Year')


## Challenge #6: Making Observations & Drawing a Conclusion

Now that we have successfully accessed the data archives from our frien

In [None]:
# List 3 observations about the data you've noticed
#========= PLACE ANSWER BELOW ===========#




In [None]:
# Which pets would you bring to Mars? Use your observations from above to justify your answer. 
#========= PLACE ANSWER BELOW ===========#




[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)