# Obtaining Data Types in a Matrix Column

### Import the Packages:

In [None]:
import pandas as pd
import numpy as np
import os 

### Load the Dataset

In [None]:
filename = os.path.join(os.getcwd(), "data", "adult.data.partial")
df = pd.read_csv(filename, header=0)

### Inspect the Data 
Use the `head()` method to inspect DataFrame `df`.

In [None]:
df.head()

### Get summary statistics by column using Pandas `describe()` Method

One useful way to quickly overview data and get insight into key statistics for each column is to use the Pandas DataFrame `describe()` method. Run the cell below to get more information about `describe()`. You can also access the online [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html).

In [None]:
df.describe?

The code cell below runs the `describe()` method on DataFrame `df`. 

In [None]:
df.describe()

###  Get the Data Types for all Columns using Pandas `dtypes` Property.

Note that some columns are excluded from the summary statistics above. This is because by default, the `decribe()` method only includes numerically valued columns. You can inspect the data type of a column's values by using the `dtypes` property. Run the code cell below and inspect the results.

In [None]:
df.dtypes

Let's take a closer look at the results.
Even if you are familiar with the data types in python, the results above may seem confusing. For example, what is an `object` type?
Not to worry: Pandas uses its own convention for referring to data types. Here is a simple table to help you map Pandas data types to other data types:

<table>
  <tr>
    <th>Pandas dtype       </th>
    <th>Python type        </th>
    <th>NumPy type         </th>   
      <th>Usage</th>
      <tr><td>object</td><td>str or mixed</td><td>string_, unicode_, mixed types</td><td>Text or mixed numeric and non-numeric values</td><tr>
<tr><td>int64	</td><td>int</td><td>int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64</td><td>Integer numbers</td></tr>
<tr><td>float64</td><td>float</td><td>float_, float16, float32, float64</td><td>Floating point numbers</td></tr>
<tr><td>bool</td><td>bool</td><td>bool_</td><td>True/False values</td></tr>
<tr><td>datetime64</td><td>NA</td><td>datetime64[ns]</td><td>Date and time values</td></tr>

  </tr>
     <tr>
      <td>category</td>	
      <td>NA	</td>
      <td>NA	</td>
      <td>Finite list of text values</td>
  </tr>
  <tr>
    <td>timedelta[ns]</td>
    <td>NA</td>
       <td>NA</td>
    <td>Differences between two datetimes</td>
  </tr>
  <tr>
      <td>category</td>	
      <td>NA	</td>
      <td>NA	</td>
      <td>Finite list of text values</td>
  </tr>
    
</table>



In the cell below, call `df.describe()` with the paramter `include='all'` . This will produce summary statistics for all columns in DataFrame `df`. Examine the results. The `describe()` method now produces a quick and easy way to access balance with regard to the label, sex, race, and other columns contaning string
values.
In particular, observe the values in `count`, `unique`, and `top`  for the `label` column:
our dataset does not appear to have a stark imbalance of one of the label classes.

In [None]:
# YOUR CODE HERE - this cell will not be graded
df.describe(include='all')

### A More Detailed Way to Read Column Types using `pd.api.types.infer_dtype()`

The code cell below creates a dictionary in which each key corresponds to a column name and each value corresponds to its data type. It uses the function `pd.api.types.infer_dtype()` to find the data type of each column. Run the cell below and inspect the results.

In [None]:
types_dict = {}
for column in df.columns:
    types_dict[column] = pd.api.types.infer_dtype(df[column])

types_dict