# Data Analysis and Visualization in Python
## Data Types and Formats
Questions
* What types of data can be contained in a DataFrame?
* Why is the data type important?

Objectives
* Describe how information is stored in a Python DataFrame.
* Define the two main types of data in Python: text and numerics.
* Examine the structure of a DataFrame.
* Modify the format of values in a DataFrame.
* Describe how data types impact operations.
* Define, manipulate, and interconvert integers and floats in Python.
* Analyze datasets having missing/null values (NaN values).
* Write manipulated data to a file.

## Types of Data

### Checking the format of our data

In [None]:
# first make sure pandas is loaded
import pandas as pd

# read in the survey csv
surveys_df = pd.read_csv("../data/surveys.csv")

In [None]:
surveys_df['sex'].dtype

In [None]:
surveys_df['record_id'].dtype

In [None]:
surveys_df.dtypes

Native Python Type | Pandas Type | Description
-------------------|-------------|------------
`str`              | `object`    | The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).
`int`              | `int64`     | 64 bits integer
`float`            | `float64`   | Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64.
 N/A               | `datetime64`| Values meant to hold time data.

## Working With Our Survey Data

In [None]:
# Convert the record_id field from an integer to a float
surveys_df['record_id'] = surveys_df['record_id'].astype('float64')
surveys_df['record_id'].dtype

### Exercise - Changing Types
Try converting the column `plot_id` to native Python `float` data type. Next try converting `weight` to `int64` integers.

In [None]:
surveys_df['plot_id'].astype("float")

In [None]:
# What happens if we try to convert weight values to int64 integers?
surveys_df['weight'].astype('int64')

## Missing Data Values - NaN

In [None]:
print(surveys_df['weight'].count(), surveys_df['weight'].mean())

### Getting Rid of the NaN’s

In [None]:
df1 = surveys_df.copy()

In [None]:
# For a stable mean value
averageW = df1['weight'].mean()
df1['weight'] = df1['weight'].fillna(averageW)

In [None]:
print(df1['weight'].count(), df1['weight'].mean())

## Writing Out Data to CSV

In [None]:
df_na = surveys_df.dropna()
df_na

In [None]:
# Write DataFrame to CSV
df_na.to_csv('surveys_complete.csv', index=False)