# Chapter 1: Dataframes



## Inspecting a DataFrame
- head() returns the first few rows (the “head” of the DataFrame).
- info() shows information on each of the columns, such as the data type and number of missing values.
- shape returns the number of rows and columns of the DataFrame.
- describe() calculates a few summary statistics for each column.

**Examples**
```python 
    print(homelessness.head())
    print(homelessness.info())
    print(homelessness.shape)
    print(homelessness.describe())
```


**Parts of a DataFrame**
- values: A two-dimensional NumPy array of values.
- columns: An index of columns: the column names.
- index: An index for the rows: either row numbers or row names.


```python
import pandas as pd 

print(homelessness.values)

print(homelessness.columns)

print(homelessness.index)
```

### <font color=black>Sorting rows</font>

You can sort the rows by passing a column name to .sort_values().

In cases where rows have the same value (this is common if you sort on a categorical variable), you may wish to break the ties by sorting on another column. You can sort on multiple columns in this way by passing a list of column names.

one column: df.sort_values("breed")

multiple column: df.sort_values(["breed", "weight_kg"])

By combining .sort_values() with .head(), you can answer questions in the form, "What are the top cases where…?".



In [None]:
# Sort homelessness by individuals
homelessness_ind = homelessness.sort_values("individuals")

# Print the top few rows
print(homelessness_ind.head())

In [None]:
# Sort homelessness by descending family members
homelessness_fam = homelessness.sort_values("family_members",ascending=False)

# Print the top few rows
print(homelessness_fam.head())

In [None]:
# Sort homelessness by region, then descending family members
homelessness_reg_fam = homelessness.sort_values(["region","family_members"],ascending=[True,False])

In [None]:
# Selecting only one column of the dataframe, using the brackets in this case the individuals column
individuals = homelessness["individuals"]

In [None]:
#Double quotes are needed to select several columns:
state_fam = homelessness[["state","family_members"]]


#### Subsetting rows


There are many ways to subset a DataFrame, perhaps the most common is to use relational operators to return True or False for each row, then pass that inside square brackets. 
**Example**
```python
dogs[dogs["height_cm"] > 60]
dogs[dogs["color"] == "tan"]
```

In [None]:
# Filter for rows where individuals is greater than 10000
ind_gt_10k = homelessness[homelessness["individuals"] > 10000]

# Filter for rows where region is Mountain
mountain_reg = homelessness[homelessness["region"]=="Mountain"]


# Filter for rows where family_members is less than 1000 
# and region is Pacific
fam_lt_1k_pac = homelessness[(homelessness["family_members"]< 
1000) &(homelessness["region"]=='Pacific')]

In [None]:
# Proper way to filter by categorical variables, using the use the .isin() method,

# The Mojave Desert states
canu = ["California", "Arizona", "Nevada", "Utah"]

# Filter for rows in the Mojave Desert states
mojave_homelessness = homelessness[homelessness["state"].isin(canu)]

# See the result
print(mojave_homelessness)

**Easy example of .isin() method**
```python
colors = ["brown", "black", "tan"]
condition = dogs["color"].isin(colors)
dogs[condition]
```


### New columns
