# Load data with Pandas 

Let us open some text data that is stored on disk. the name of the file is `dogggy-boot-harness.csv`

In [1]:
import pandas
dataset = pandas.read_csv('doggy-boot-harness.csv')

dataset.head()

Unnamed: 0,boot_size,harness_size,sex,age_years
0,39,58,male,12.0
1,38,58,male,9.6
2,37,52,female,8.6
3,39,58,male,10.2
4,38,57,male,7.8


This data set has information about dogs, including their doggy boot size, harness size, sex and age in years.
Data is stored as columns and rows.


# Filter data by Columns

data is easy to filter by columns. We can either type :
* `dataset.column_name` or 
* `dataset['column_name']`

Now we will look at the harness sizes, and delete sex and age columns.

In [2]:

# Look at the harness sizes
print("Harness sizes")
print(dataset.harness_size)

# Remove the sex and age-in-years columns.
del dataset["sex"]
del dataset["age_years"]

# Print the column names
print("\nAvailable columns after deleting sex and age information:")
print(dataset.columns.values)

Harness sizes
0     58
1     58
2     52
3     58
4     57
5     52
6     55
7     53
8     49
9     54
10    59
11    56
12    53
13    58
14    57
15    58
16    56
17    51
18    50
19    59
20    59
21    59
22    55
23    50
24    55
25    52
26    53
27    54
28    61
29    56
30    55
31    60
32    57
33    56
34    61
35    58
36    53
37    57
38    57
39    55
40    60
41    51
42    52
43    56
44    55
45    57
46    58
47    57
48    51
49    59
Name: harness_size, dtype: int64

Available columns after deleting sex and age information:
['boot_size' 'harness_size']


# Filter data by Rows



In [3]:
# Print how many rows of data we have
print(f"We have {len(dataset)} rows of data")

We have 50 rows of data


We can get data from the top of the table by using the `head() `function, or from the bottom of the table by using the `tail()` function.

Both functions make a shallow copy of a section of our dataframe. Here, we're sending these copies to the print() function. The head and tail views can also be used for other purposes, such as for use in analyses or graphs.

In [4]:
# Print the data at the top of the table
print("TOP OF TABLE")
print(dataset.head())

# print the data at the bottom of the table
print("\nBOTTOM OF TABLE")
print(dataset.tail())

TOP OF TABLE
   boot_size  harness_size
0         39            58
1         38            58
2         37            52
3         39            58
4         38            57

BOTTOM OF TABLE
    boot_size  harness_size
45         41            57
46         39            58
47         39            57
48         35            51
49         39            59


We can also filter logically. For example, we can look at data for dogs who have a harness smaller than a size 55.

## subset a number of rows using harness size

This works by calculating a `True` or `False` value for each row, then keeping only those rows where the value is `True`.

In [5]:
data_from_small_dogs = dataset[dataset.harness_size < 55].copy()
data_from_small_dogs.head()

Unnamed: 0,boot_size,harness_size
2,37,52
5,35,52
7,36,53
8,35,49
9,40,54


## subset a number of rows using boot size

We can also create a new group with smaller paws:


In [6]:
data_smaller_paws = dataset[dataset.boot_size < 40].copy()
data_smaller_paws.head()

Unnamed: 0,boot_size,harness_size
0,39,58
1,38,58
2,37,52
3,39,58
4,38,57


## Graph Data

Graphing data is often the easiest way to understand it. 

In these exercises, we usually make our graphs using code in a custom file we've created, called `graphing.py`, which you can look at on our github page.

Here, we'll practice making a graph without this custom code, however.

Lets make a simple graph of harness size versus boot size for our avalanche dogs with smaller feet. 

In [8]:
#load and prepare plotly to create our graphs
import plotly.express as px

# Show a graph of harness size by boot size
px.scatter(data_smaller_paws, x="harness_size", y="boot_size")


# Create new columns

The preceding graph shows the relationship we want to investigate for our store, but some customers might want harness-size lists in inches, not centimeters. How can we view these harness sizes in imperial units?

To do this, we will need to create a new column called harness_size_imperial and put that on the X axis instead.

Creating new columns uses very similar syntax to what we've seen before. 

In [9]:
# Convert harness sizes from metric to imperial units 
# and save the result to a new column

data_smaller_paws['harness_size_imperial'] = data_smaller_paws['harness_size'] * 0.393701

# Show a graph of harness size in imperial units
# by boot size
px.scatter(data_smaller_paws, x="harness_size_imperial", y="boot_size")

We've now graphed our new column of data (`harness_size_imperial`) against boot size for dogs with small paws.

# Summary
We've introduced working with data in Python, including:

* Opening data from a file into a DataFrame (table)
* Inspecting the top and bottom of the dataframe
* Adding and removing columns of data
* Removing rows of data based on criteria
* Graphing data to understand trends

Learning to work with dataframes can feel tedious or dry, but keep going, because these basic skills are critical to unlocking exciting machine-learning techniques that we'll cover in later modules.