# Lists, Dictionaries, DataFrames

**Objectives**
* Introduce major sequence types in Python (Lists, dictionaries, DataFrames)
* Practice using methods



## Lists


There are a few key types that contain multiple values. We will discuss two of those here: lists and dictionaries.

Lists are a collection of ordered items. They have a length, and items can be indexed based on their positions

In [115]:

country_list = ["Afghanistan", "Canada", "Thailand", "Denmark", "Japan"]
type(country_list)

print(len(country_list))

5


You can index a list using square brackets:

In [8]:
print(country_list[0])
print(country_list[1:4])

print(country_list[1:])

#you can also use negative numbers to index from the end

Afghanistan
['Canada', 'Thailand', 'Denmark']
['Canada', 'Thailand', 'Denmark', 'Japan']


## Challenge: Slicing Lists

Using the lists in the next cell:

1) What does thing[start:stop] do?

2) Write three different ways to slice the string from 'elephant' to the end

In [89]:
thing = [1,3,8,'elephant', 'banana', 2]
start = 2
stop = 5

# Operations on lists: 

Types can have methods associated with them that are accessed with dot notation (listname.methoed()). These are functions that operate specifically over the list. Most common is the append() method, which adds an item to the end of a list.

In [90]:
country_list.append('USA')

## Challenge: 
Using the same list 'thing':

1) Append each item to the list individually: `'apple'`, `8`, and `9`. What is the output. Is it what you expected?

2) Make a list out of the items from (1) and append the list to `thing`. How does the output differ from (1)?

3) Look at the [documentation](https://docs.python.org/3/tutorial/datastructures.html) for the list method `.extend()`. Is there a way to rewrite your answer to (2) to use extend? How does that compare to the outputs to (1) and (2)?

4) What is one situation in which you would use `append` and one where you would use `extend`?

**Hint**: *Iterable* in Python means an object with multiple values that can be iterated through (including lists, tuples, even strings)

In [98]:
thing = [1,3,8,'elephant', 'banana', 2]


### Other types

There are many more data types in Python that you may run across. A few of them are:

* **tuple** : similar to a list, but values can't be changed
* **set** : contains only unique values, unordered
* **range**: a sequence of numbers 
* And many many [more](https://docs.python.org/3/library/stdtypes.html#immutable-sequence-types)!


We often interact with these more often as the output of functions rather than writing them ourselves, but it's good to be aware of them. 

# Dictionaries

A dictionary is another data type. This data type is organized using key, value pairs. The keys can be used to access the values. Keys can be strings or integers and are unordered. Values can be any data type.

In [99]:
example_dict = {"name": "Forough Farrokhzad", \
            "year of birth": 1935, \
            "year of death": 1967, \
            "place of birth": "Iran", \
            "language": "Persian"}

poets_dict['year of birth']

1935

In [100]:
print(example_dict.keys())

dict_keys(['name', 'year of birth', 'year of death', 'place of birth', 'language'])


## Challenge
Make a dictionary `fruits` with the following lists. 

In [101]:
fruit = ['apple','orange','mango']
length = [3.2,2.1,3.1]
color = ['red','orange','yellow']

fruits={'fruit':fruit,
      'length':length,
      'color':color}

Dictionaries are useful for hierarchical storage of data (and can even be nested!) They are also often used to initialize DataFrames, a useful datatype for tabular data.

## DataFrames

We commonly want to represent tabular data in Python. The most common way to do that is using a DataFrame. DataFrame is a type from the Pandas package. We can use a dictionary to make a table in pandas. 

In [106]:
fruit = ['apple','orange','mango','strawberry','salmonberry','thimbleberry']
size = [3,2,3,1,1,1]
color = ['red','orange','orange','red','orange','red']

fruits={'fruit':fruit,
      'size':size,
      'color':color}

In [107]:
import pandas as pd

df = pd.DataFrame(fruits)
df

Unnamed: 0,fruit,size,color
0,apple,3,red
1,orange,2,orange
2,mango,3,orange
3,strawberry,1,red
4,salmonberry,1,orange
5,thimbleberry,1,red


The keys become column names and the values become cells in the table. In addition, there is an *index* on the left that keeps track of the row.

We can see the number of columns and rows with `df.shape`. How many rows and columns does this dataframe have?

In [108]:
df.shape

(6, 3)

## Challenge: Initializing a DataFrame

The following code gives an error. Why does it have an error? What are some ways we could fix this?

In [109]:
fruit = ['apple','orange']
length = [3.2,2.1,3.1]
color = ['red','orange','yellow']

fruits={'fruit':fruit,
      'length':length,
      'color':color}
pd.DataFrame(fruits)

ValueError: All arrays must be of the same length

## DataFrame Slicing and Methods

We can choose a single column by selecting the name of that column. This is a series object (basically a vector)



In [110]:
df.loc[:,'fruit'] #colon selects all rows

0           apple
1          orange
2           mango
3      strawberry
4     salmonberry
5    thimbleberry
Name: fruit, dtype: object

We can choose a row by using `df.loc[index,:]`

In [111]:
df.loc[0,:] #colon selects all columns

fruit    apple
size         3
color      red
Name: 0, dtype: object

In [112]:
#specify a single cell
df.loc[0,'fruit']

'apple'

DataFrames also have methods, including those for [merging](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge), [aggregation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html), [nulls](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html), and many more. For example, we can identify the number of unique values in each column by using `nunique()`

In [113]:
df.nunique()

fruit    6
size     3
color    2
dtype: int64

We can also count how many unique values of each type for a column using `valuecounts`

In [114]:
df.value_counts(['color'])

color 
orange    3
red       3
dtype: int64

## Challenge
**UNDER CONSTRUCTION**

Challenge with using methods for DataFrames- tackling a common problem


We will use DataFrames and other types throughout this workshop, especially during Part 3. For more, consider the Python Data Wrangling workshop (almost entirely dealing with DataFrames!) and other resources.