## Convert time format

Our next step in processing our data is making our dates readable. 

For example:

1880.5 = 1880 + ½ of a year = 1880 + 6 months = 06/1880 or June 1880

**Our goal**: Convert the date column into separate columns for year and month

### Step 1: Do it for one date 

**Some useful functions**

numpy.ceil(**x**) : Returns the number x rounded to the closest and largest whole number
* **x**: number or array

numpy.floor(**x**) : Returns the number x rounded to the closest and smallest whole number
* **x**: number or array

numpy.round(**x**, **decimels**) : Returns the rounded number x or rounds all elements in array
* **x**: number or array
* **decimels**: number of decimels to round to

numpy_array.astype(**data type**) : Converts elements in array to a different data type
* **data type**: int, float, string etc

In [1]:
import numpy as np

In [2]:
#Useful functions

test = 2.5
print('number:',test)
print('np.ceil(number):', np.ceil(2.5))
print('np.floor(number):', np.floor(2.5))
print('np.round(number):', np.round(2.5,0))

test_array = np.arange(2.75,3.25,0.05)
print('\n array:',test_array)
print('np.ceil(array):', np.ceil(test_array))
print('np.floor(array):', np.floor(test_array))
print('np.round(array):', np.round(test_array,0))

print('\n array with elements as int:',test_array.astype(int))

number: 2.5
np.ceil(number): 3.0
np.floor(number): 2.0
np.round(number): 2.0

 array: [2.75 2.8  2.85 2.9  2.95 3.   3.05 3.1  3.15 3.2 ]
np.ceil(array): [3. 3. 3. 3. 3. 3. 4. 4. 4. 4.]
np.floor(array): [2. 2. 2. 2. 2. 2. 3. 3. 3. 3.]
np.round(array): [3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]

 array with elements as int: [2 2 2 2 2 2 3 3 3 3]


Let's start small. Find the year and month associated with the given test date.

In [3]:
date = 1880.5
print(date)
#What do these variables equal?
#year =
#month = 

1880.5


### Step 2: Do it for an array of dates 

Remember: you can run operations on numpy arrays like they are numbers

For example:

```A = [1 , 2, 3]```

```A * 2 = [2 , 4, 6]```

In [4]:
date_array = np.arange(1880, 1882.05, 1/12)
print(date_array)
#print(date_array*5/6)

year_array_rounded = np.round(date_array, 1)
print(year_array_rounded)

year_array_floor = np.floor(year_array_rounded)
print(year_array_floor)
#What do these arrays equal? 
#year_array = np.arange(1880, 1882.05, 1/3)
#month_array = np.arange(1880, 1882.05, 1/12)

[1880.         1880.08333333 1880.16666667 1880.25       1880.33333333
 1880.41666667 1880.5        1880.58333333 1880.66666667 1880.75
 1880.83333333 1880.91666667 1881.         1881.08333333 1881.16666667
 1881.25       1881.33333333 1881.41666667 1881.5        1881.58333333
 1881.66666667 1881.75       1881.83333333 1881.91666667 1882.        ]
[1880.  1880.1 1880.2 1880.2 1880.3 1880.4 1880.5 1880.6 1880.7 1880.7
 1880.8 1880.9 1881.  1881.1 1881.2 1881.2 1881.3 1881.4 1881.5 1881.6
 1881.7 1881.7 1881.8 1881.9 1882. ]
[1880. 1880. 1880. 1880. 1880. 1880. 1880. 1880. 1880. 1880. 1880. 1880.
 1881. 1881. 1881. 1881. 1881. 1881. 1881. 1881. 1881. 1881. 1881. 1881.
 1882.]


In [5]:
date_array = np.arange(1880, 1882.05, 1/12)
print(date_array)

month_array = np.ceil(12*(date_array - year_array_floor)+1)
print('\n', month_array)

[1880.         1880.08333333 1880.16666667 1880.25       1880.33333333
 1880.41666667 1880.5        1880.58333333 1880.66666667 1880.75
 1880.83333333 1880.91666667 1881.         1881.08333333 1881.16666667
 1881.25       1881.33333333 1881.41666667 1881.5        1881.58333333
 1881.66666667 1881.75       1881.83333333 1881.91666667 1882.        ]

 [ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.  1.  2.  3.  4.  5.  6.
  7.  8.  9. 10. 11. 12.  1.]


### Step 3: Do this for a Pandas data column

Remember: pandas columns work exactly like numpy arrays

In [6]:
import pandas as pd

In [7]:
date_df = pd.DataFrame({'Date':date_array,'Data':np.random.rand(len(date_array))})
print(date_df.head())
#What do these columns equal?
#date_df['Year'] = 
#date_df['Month'] =

#round to two decimal places
year_df_rounded = np.round(date_df['Date'],3)
print('\n', year_df_rounded.head())

#Round to lowest whole number
year_df_floor = np.floor(year_df_rounded)
print('\n', year_df_floor.head())

#find month
month_df = np.ceil(12*(date_df['Date']- year_df_floor)+1)
print('\n', month_df.head())

          Date      Data
0  1880.000000  0.372875
1  1880.083333  0.836821
2  1880.166667  0.479535
3  1880.250000  0.929687
4  1880.333333  0.817060

 0    1880.000
1    1880.083
2    1880.167
3    1880.250
4    1880.333
Name: Date, dtype: float64

 0    1880.0
1    1880.0
2    1880.0
3    1880.0
4    1880.0
Name: Date, dtype: float64

 0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
Name: Date, dtype: float64


In [8]:
#create column for year
date_df['Year'] = year_df_floor

#create new column for month
date_df['Month']= month_df

print(date_df.head())



          Date      Data    Year  Month
0  1880.000000  0.372875  1880.0    1.0
1  1880.083333  0.836821  1880.0    2.0
2  1880.166667  0.479535  1880.0    3.0
3  1880.250000  0.929687  1880.0    4.0
4  1880.333333  0.817060  1880.0    5.0


### Step 4: Write a function

Write a function that takes any Pandas column with dates in the this format and creates a new dataframe with columns for year and month instead.  

Discuss in groups about what will go into this skeleton for a function and write your pseudo-code in your lab notes.

```def (function_inputs):
    do something
    return function_outputs```
        

In [9]:
#Function goes here

def convert_date_to_year_month(x):
    #round to two decimal places
    year_df_rounded = np.round(x['Date'],3)

    #Round to lowest whole number
    year_df_floor = np.floor(year_df_rounded)
    
     #create column for year
    x['Year'] = year_df_floor

    #find month
    month_df = np.ceil(12*(x['Date']- year_df_floor)+1)

    #create new column for month
    x['Month']= month_df

    return x

In [10]:
convert_date_to_year_month(date_df)

Unnamed: 0,Date,Data,Year,Month
0,1880.0,0.372875,1880.0,1.0
1,1880.083333,0.836821,1880.0,2.0
2,1880.166667,0.479535,1880.0,3.0
3,1880.25,0.929687,1880.0,4.0
4,1880.333333,0.81706,1880.0,5.0
5,1880.416667,0.545628,1880.0,6.0
6,1880.5,0.224031,1880.0,7.0
7,1880.583333,0.352284,1880.0,8.0
8,1880.666667,0.254866,1880.0,9.0
9,1880.75,0.965844,1880.0,10.0


In [11]:
#Test on date_df
def addition (a,b):
    c= a + b
    return c
    

addition (1.2,2)

3.2

In [12]:
date = 1880.5

def conv(x):
    
    import numpy as np
    year = np.floor(x)
    
    month = np.ceil(12*(x - year))
    return (year, month)

conv(date)

(1880.0, 6.0)