## Notebook Topic: Getting Familiar with Python

<ins>Learning Objectives</ins>

1. To deal with missing data
2. To check object types

**Section III: Missing Data**

When dealing with real data, you may often have to deal with missing data.  

For example, you collect signals from individuals using your companies Global Positioning System via satellite.  You do this in order to improve the maps being used by clients.  The satellite collects clear signals when weather is good but cannot collect any signals when weather is bad (like a severe thunderstorm).  On that particular day, your GPS cannot provide accurate estimated times of arrival, inform you of upcoming traffic jams, or warn you about the cops up ahead.

Because missing data is more common than we think, we need to be prepared to deal with the situation!

* Missing data will typically appear is as ```NaN``` or ```nan```.

Less common, you may encounter data that is represented as a positive or negative infinity, so we will learn how to idenfity those too.



In [None]:
# the math library has some useful functions
import math

x = math.nan
y = 3

math.isnan(y)

<span style="color:purple">Answer these questions in this text box: Why do you think you get that output for y?  What happens when you check x?</span>

In [None]:
w = -math.inf

math.isfinite(w)

Now, what happens when we have a vector containing these values?  Let's find out!

In [None]:
# remember we need to use the numpy library for vectors
import numpy as np

z = np.array([y, x, w])

In [None]:
np.isinf(z)

In [None]:
np.isnan(z)

In [None]:
np.isfinite(z)

<span style="color:purple">What is the primary difference between the checks on **z** versus what we did to **x**, **y**, and **w** directly?</span>

**Section IV: Object Type Checks**

Essentially what we did was to see if our objects (the simple **x**, **y**, and **w**; and elements **x**, **y**, and **w** from the array **z**) were of type nan, infinity, or finite.  We can check objects types more broadly as we'll see in this section.

**Object Types**

In the *00 - Pre_class.ipynb* notebook, we used a function called ```type```.  If you don't remember, go review that notebook!

There are a few basic object types in Python:

*   Numeric (such as integers, floats)
*   Strings (individual letters, words, and sentences; all denoted with quotation marks)
*   Logical/Boolean (True and False)

*type* will inform you whether a particular variable is of one of the types above.

In [None]:
# does this return numeric?
type(math.inf)

In [None]:
# what about this?
type(math.nan)

There are several higher level object types.  

* Vectors (which we've seen a few of already) and matrices (we'll see in the near future) can only be all numeric or all strings or all boolean, etc.  We will become better acquianted with the *numpy* library for these.

* Data frames can have columns *from* each of the basic types.  Data frames are an object we will use a lot for data analysis.  One of the main libraries we will work with involving these data types is *pandas*.


**Section V: Casting Types**

You can switch object types as needed.  But this only makes sense to do for some variable relationships.  For example,

*   string <-> numeric
*   boolean <-> numeric

In [None]:
# when you have a vector of strings of numbers
alpha = np.array(["1", "2", "3", "4", "5"])
alpha.dtype

In [None]:
alpha.astype('float')


In [None]:
# when you have numbers you need to be strings
x = np.array([np.random.normal(loc = 0, scale = 1) for i in range(1,10)])
x


In [None]:

y = [str(i) for i in x]
type(y[0]) #is this true for all elements? check in a new code chunk

**Section VI: Dealing with Missing Data**

Let's create a dataset ```X``` that has some missing data.  And let's learn one way to deal with it.

In [None]:
import pandas as pd
import numpy as np

# create data
mat = np.array([[1, math.nan, 3],[0, 0, 3],[-math.inf, -1, 3]])
# make columns names
c = ["height", "width", "axis angle"]
# make the data frame with data and columns names
X = pd.DataFrame(data = mat, columns = c, index = None)

X

<span style="color:purple">Describe what you see about ```X``` in this text box.</span>

Now let's check the data for any missing data (i.e., any nans or infinity's).  <span style="color:purple">Please follow the tasks in the comments.</span>

In [None]:
# fill in the blank with the function we used earlier to find nan's in an array
___(X.values) 

In [None]:
# fill in the blank with the function we used earlier to find inf's in an array
____(X.values) 

To determine the elements in the dataset that are *either* nan **or** inf, we want to take the previous two lines of code and use one of the logical operators we learned about in notebook 02.  In the next code chunk, <span style="color:purple">go ahead and try to figure it out! You have everything you need to do it. </span>

In [None]:
____(X.values) <insert logical operator here> ____(X.values) 

<span style="color:purple">You should name the line of code above ```indx```.</span>  This is just like naming a variable.
We will use this information to give ```X``` new values for those elements!

In [None]:
X.values[indx] = np.mean(X.values[~indx])

<span style="color:purple">Describe what you think the above code is doing in this text box.  You have been exposed to lines of code like this before!</span>

In [None]:
X

<span style="color:purple">Describe what you notice about ```X``` now in this text box.  What are the main differences you notice?</span>

## Conclusion

<span style="color:purple">Name at least four different things you *think* you were __meant__ to learn from this notebook.</span>

**Note** I highly recommend you read Chapter 1-3 for an alternative introduction to Python and Jupyter notebooks.