# Data Analysis and Visualization in Python
## Data Types and Formats
Questions
* What types of data can be contained in a DataFrame?
* Why is the data type important?

Objectives
* Describe how information is stored in a Python DataFrame.
* Define the two main types of data in Python: text and numerics.
* Examine the structure of a DataFrame.
* Modify the format of values in a DataFrame.
* Describe how data types impact operations.
* Define, manipulate, and interconvert integers and floats in Python.
* Analyze datasets having missing/null values (NaN values).
* Write manipulated data to a file.

## Types of Data
### Numeric Data Types

In [None]:
type(2 ** 48)

In [None]:
type(2.0 ** 48)

### Text Data Type

In [None]:
type("2048")

### Checking the format of our data

In [None]:
# first make sure pandas is loaded
import pandas as pd

# read in the survey csv
surveys_df = pd.read_csv("../data/surveys.csv")

In [None]:
type(surveys_df)

In [None]:
surveys_df['sex'].dtype

In [None]:
surveys_df['record_id'].dtype

In [None]:
surveys_df.dtypes

Native Python Type | Pandas Type | Description
-------------------|-------------|------------
`str`              | `object`    | The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).
`int`              | `int64`     | 64 bits integer
`float`            | `float64`   | Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64.
 N/A               | `datetime64`| Values meant to hold time data.

### Working With Integers and Floats

In [None]:
print(5 + 5)
print(24 - 4)

In [None]:
print(5 / 9)
print(10 / 3)

print(5 // 9)
print(10 // 3)

In [None]:
# convert a to integer
a = 7.83
print(int(a))

# convert to float
b = 7
print(float(b))

## Working With Our Survey Data

In [None]:
# Convert the record_id field from an integer to a float
surveys_df['record_id'] = surveys_df['record_id'].astype('float64')
surveys_df['record_id'].dtype

### Exercise - Changing Types
Try converting the column `plot_id` to floats. Next try converting `weight` to an integer.

In [None]:
surveys_df.plot_id.astype("float")

In [None]:
# What happens if we try to convert weight values to integers?
surveys_df['weight'].astype('int64')

## Missing Data Values - NaN

In [None]:
surveys_df['weight'].mean()

### Where Are the NaN’s?

In [None]:
len(surveys_df[pd.isnull(surveys_df.weight)])

In [None]:
# how many rows have weight values?
len(surveys_df[surveys_df.weight > 0])

In [None]:
df1 = surveys_df.copy()

# fill all NaN values with 0
df1['weight'] = df1['weight'].fillna(0)

In [None]:
df1['weight'].mean()

In [None]:
# For a stable mean value
df1['weight'] = surveys_df['weight'].fillna(surveys_df['weight'].mean())

print(df1['weight'].mean())
print(surveys_df['weight'].mean())

## Writing Out Data to CSV

In [None]:
df_na = surveys_df.dropna()
df_na

In [None]:
# Write DataFrame to CSV
df_na.to_csv('surveys_complete.csv', index=False)