# Data Analysis and Visualization in Python
## Data Types and Formats
Questions
* How is the information stored in a DataFrame?
* How to format the data?

Objectives
* Describe how information is stored in a Python DataFrame.
* Define the two main types of data in Python: characters and numerics.
* Examine the structure of a DataFrame.
* Modify the format of values in a DataFrame.
* Describe how data types impact operations.
* Define, manipulate, and interconvert integers and floats in Python.
* Analyze datasets having missing/null values (NaN values).

### How to Use Jupyter
When a cell is in edit mode:

  Shortcut  | Description
----------- | -----------
Shift+Enter | Run the cell, and go to the next
Tab         | Indent code or auto-completion
Esc         | Go to command mode

When a cell is in command mode:

  Shortcut   | Description
------------ | -----------
Shift+Enter  | Run the cell, and go to the next
Double-click | Go to edit mode
Enter        | Go to edit mode

  Shortcut   | Description
------------ | -----------
A            | Insert a cell above
B            | Insert a cell below
C            | Copy the current cell
V            | Paste the cell below
D D          | Delete the current cell

To reset all cells:
* Go to the top menu, and select Kernel -> Restart & Clear Output

## Making Sure Our Data Are Loaded

In [None]:
# first make sure pandas is loaded
import pandas as pd

# read in the survey csv
surveys_df = pd.read_csv("../data/surveys.csv")

## Types of Data
### Numeric Data Types

In [None]:
###(### ** 48)

In [None]:
type(### ** 48)

### Character Data Types

In [None]:
type(###48")

### Checking the format of our data

In [None]:
type(###)

In [None]:
surveys_df['sex']###

In [None]:
surveys_df['record_id'].dtype

In [None]:
surveys_df.dtype###

Native Python Type | Pandas Type | Description
-------------------|-------------|------------
`str`              | `object`    | The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).
`int`              | `int64`     | 64 bits integer
`float`            | `float64`   | Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64.
 N/A               | `datetime64`| Values meant to hold time data.

### Working With Integers and Floats

In [None]:
print(5 + 5)
print(24 - 4)

In [None]:
print(5 / 9)
print(10 / 3)

###print(5 ### 9)
###print(10 ### 3)

In [None]:
# convert a to integer
a = 7###
print(###(a))

# convert to float
b = 7
print(###(b))

## Working With Our Survey Data

In [None]:
# Convert the record_id field from an integer to a float
surveys_df['record_id'] = surveys_df['record_id'].###('###')
surveys_df['record_id'].dtype

In [None]:
# What happens if we try to convert weight values to integers?
surveys_df['wgt'].astype('###')

In [None]:
surveys_df['wgt'].mean()

### Missing Data Values - NaN

In [None]:
len(surveys_df[###(surveys_df.wgt)])

In [None]:
# how many rows have weight values?
len(surveys_df[surveys_df.wgt ###])

In [None]:
df1 = surveys_df.###()

# fill all NaN values with 0
df1['wgt'] = df1['wgt'].###(###)

In [None]:
df1['wgt'].mean()

In [None]:
df1['wgt'] = surveys_df['wgt'].fillna(surveys_df['wgt'].###)

print(df1['wgt'].mean())
print(surveys_df['wgt'].mean())