
## Inspecting Data Types and Applying Functions

Author: Dr. Elaina A. Hyde

---

In [1]:
import pandas as pd
import numpy as np

**1. Create a small DataFrame with different data types (provided).**

In [2]:
# create a small dictionary with different data types

dft = pd.DataFrame(dict(A = np.random.rand(3),
                        B = 1,
                        C = 'foo',
                        D = pd.Timestamp('20010102'),
                        E = pd.Series([1.0]*3).astype('float32'),
                                F = False,
                                G = pd.Series([1]*3,dtype='int8')))

dft

Unnamed: 0,A,B,C,D,E,F,G
0,0.316957,1,foo,2001-01-02,1.0,False,1
1,0.27384,1,foo,2001-01-02,1.0,False,1
2,0.994847,1,foo,2001-01-02,1.0,False,1


**2. Examine the data types of the columns.**

In [3]:
# .dtypes is a really easy way to see what kind of dtypes 
# are in each column. 
dft.dtypes

A           float64
B             int64
C            object
D    datetime64[ns]
E           float32
F              bool
G              int8
dtype: object

**3. Create a Series object with the integers 1-5 and float 6.0. What data type is the Series?**

In [4]:
# If a pandas object contains data multiple dtypes IN A 
# SINGLE COLUMN, the dtype of the column will be chosen 
# to accommodate all of the data types (object is the 
# most general).
# these ints are coerced to floats

pd.Series([1, 2, 3, 4, 5, 6.])

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
dtype: float64

**4. Create a Series with data: `[1, 2, 3, 6., 'foo']`. What data type is the series?**

In [5]:
# string data forces an ``object`` dtype

pd.Series([1, 2, 3, 6., 'foo'])

0      1
1      2
2      3
3      6
4    foo
dtype: object

**5. Find how many columns of each type there are with the `.get_dtype_counts()` function.**

In [6]:
# The method get_dtype_counts() will return the number 
# of columns of each type in a DataFrame:

dft.get_dtype_counts()

bool              1
datetime64[ns]    1
float32           1
float64           1
int64             1
int8              1
object            1
dtype: int64

**6. Create another small DataFrame (provided).**

In [7]:
# create a small data frame. 

df = pd.DataFrame(np.random.randn(5, 4), columns=['a', 'b', 'c', 'd'])
df

Unnamed: 0,a,b,c,d
0,1.150589,-1.084208,-0.16799,0.358656
1,-1.34139,0.142643,-0.187199,1.033294
2,-0.512068,1.805589,0.501125,0.601134
3,-1.101561,0.945773,-0.38989,-0.929193
4,-1.139032,-1.580035,-0.585217,-0.825282


**7. Use the `.apply()` function to find the square root of all the cells.**

In [8]:
# Use df.apply to find the square root of all the values. 
# NaN means not a number

df.apply(np.sqrt)

Unnamed: 0,a,b,c,d
0,1.072655,,,0.598879
1,,0.37768,,1.01651
2,,1.343722,0.707902,0.775328
3,,0.972509,,
4,,,,


**8. Use `.apply()` to find the mean of the columns.**

In [9]:
# find the mean of all of the columns

df.apply(np.mean, axis=0)

a   -0.588692
b    0.045952
c   -0.165834
d    0.047722
dtype: float64

**9. Find the mean of the rows.**

In [10]:
# find the mean of all of the rows

df.apply(np.mean, axis=1)

0    0.064262
1   -0.088163
2    0.598945
3   -0.368718
4   -1.032392
dtype: float64

**10. Use numpy to create a random vector of 50 numbers ranging from 0 to 6.**

*Hint: This can be done with `np.random.randint()`.

In [11]:
# Let's create a random array with 50 numbers, ranging 
# from 0 to 6.

data = np.random.randint(0, 7, size = 50)
data

array([4, 3, 4, 2, 4, 6, 1, 2, 3, 0, 4, 5, 1, 0, 5, 6, 4, 1, 6, 3, 3, 4, 0,
       6, 0, 1, 2, 5, 2, 0, 3, 3, 4, 5, 1, 5, 1, 1, 4, 4, 5, 3, 6, 0, 2, 6,
       6, 0, 0, 2])

**11. Convert the vector to a Series and count the occurrences of each number.**

In [12]:
# convert the array into a series

s = pd.Series(data)

In [13]:
# How many of each number is there in the series? Enter 
# value_counts()

pd.value_counts(s)

4    9
0    8
6    7
3    7
1    7
5    6
2    6
dtype: int64