### Working with pandas dataframes


[Official Documentation](https://pandas.pydata.org/docs/)

[Course page](https://ds.codeup.com/python/dataframes/)


<div class="alert alert-block alert-info">

#### Learning Goals
    
- Understand the structure of a dataframe
- View information about the contents of a dataframe
- Determine attributes of the data within the dataframe
- Manipulate columns
- Sort and filter contents of dataframe
- Create, modify, and drop columns
- View descriptive statistics for data

### Sources of dataframes

- Created from dictionaries, lists, or arrays
- Imported from Python libraries (e.g. `pydataset`)
- Read from `csv`, `tsv`, `xlsx` files
- SQL databases

---

### Import statements



In [None]:
# Imported libraries
import pandas as pd

# Numpy to build arrays
import numpy as np

# Source of datasets
from pydataset import data

# Set a value to initialize the random integer generation
np.random.seed(123)

---

## Creating a dataframe

- Combine multiple series of equal length.


In [None]:
# Create a list of students
students = ['Sally', 'Jane', 'Suzie', 'Billy', 'Ada', 'John', 'Thomas', 'Marie', 'Albert', 'Richard', 'Isaac', 'Alan']


In [None]:

# Randomly generate 12 scores for each subject (1 per student)
# Store values as arrays
# Note that all the values need to have the same length here

math_grades = np.random.randint(low=60, high=100, size=len(students))

english_grades = np.random.randint(low=60, high=100, size=len(students))

reading_grades = np.random.randint(low=60, high=100, size=len(students))

In [None]:
# Create a dictionary with structure:
# 'column_name': <array or list>

df_dict = {'name': students,
                   'math': math_grades,
                   'english': english_grades,
                   'reading': reading_grades}

In [None]:

# Use pd.DataFrame
df = pd.DataFrame(df_dict)

# View the type
type(df)

### Check data types

### Display a table for the dataframe

### Dimension of dataframe

Format: A tuple with (rows, columns)

### View portions of the dataframe

```python
.head(n) # first n rows (default = 5 rows)
.tail(n) # last n rows (default = 5 rows)
.sample(n) # randomly select n rows (default = 1 row)
```

### Information about the dataframe's contents

| Name | Description |
| ---:|:------ |
|`#`  | The row index|
|`Column`| Index/name of column (a string) |
|`Non-Null Count` | Number of non-empty values 
|`Dtype` |    The data type |


### Data types

### Descriptive Statistics

---

## Working with Columns

- View column information
- Rename columns
- Drop columns


### View Column Information

### Check data types

<div class="alert alert-block alert-info">
    
### Important Notes
    
- Keep in mind what data structure you are working with
- Recall that methods do not alter the original data structure
- Check documentation for what the parameters and results of a pandas function are




### Dropping columns

- Drop all columns except for the student's name and math grade
- Assign the results to a new dataframe


### Tip

- Create a list with the columns you want to drop

### Renaming columns

- Rename the columne `name` to `student`
- Use a dictionary structure
- Assign to a new dataframe

```python
{'old_name': 'new_name'}
```

### Selecting a column

### Creating new columns


- Use a comparison operator on a column to return Boolean values 
- Create a new column with these contents
- Use assign to set the column values equal to the Boolean values

---

### Sorting dataframes

- Use `.sort_values`
- set which column to sort by
- Set ascending or descending

### Chaining Operations

---

## Importing datasets

```python
import pandas as pd
from pydataset import data
```

### View available datasets

---
<div class="alert alert-block alert-success">
    
### In-class exercise 

- Randomly select one dataset from `pydataset`
- Load the dataset as a dataframe
- View information about the dataset


---

### Logical Operators


 `&`    and 
 
 `|`    or
 
 `not`  not

### Find students who are eligible for:

- High-performer award (At leat 80 in Math and English)
- Subject award (at least 90 in a subject)
- Exemplary award (at least 90 in all subjects)