# **Lecture 10B**
# **Reading time columns from data files**
In this part we will learn how to prepare time in Excel & CSV files and how to read those time columns into DataFrame correctly.

Before your start, run the two cells below to connect to Google Drive and load pandas module.

In [None]:
# Run the code below to access files in your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# We also need Panadas module in this lecture
# Import Pandas module
import pandas as pd

---
**Example 1:** In Excel there is no distinction between *datetime* and *time duration* data. Excel will simply ignore the date part when you define time duration data. When reading these data into Python, they will show up as datetime64 type. However, we want them as timedelta64 (time duration) in Python.

When you have time duration data, it is better to represent them as string in Excel or CSV file. Once they are read, we will convert these string into timedelta64.

In **time_txt** worksheet of **time_examples.xlsx**, the column **Duration** is a string in Excel. When this worksheet is read, this column will show up as string in the DataFrame. We will convert it to timedelta64 by using **pd.to_timedelta()** function.
* **pd.to_timedelta(*time_string*)** will convert the given string into a type of **timedelta64** (i.e. time duration).

In [None]:
# Read time_examples.xlsx data file
# In the first worksheet "time_txt", Duration column is string in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/time_examples.xlsx",sheet_name="time_txt")
display(data)
print(type(data))
print(data.dtypes)

# Convert Duration from string to timedelta64
data["DurTime"] = pd.to_timedelta(data["Duration"])

# Show the type of the columns in the DataFrame
display(data)
print(data.dtypes)

Unnamed: 0,JobID,Duration
0,1,20:08:32.12
1,2,07:12:42.33
2,3,09:33:12.98
3,4,17:45:39.05
4,5,21:19:11.67


<class 'pandas.core.frame.DataFrame'>
JobID        int64
Duration    object
dtype: object


Unnamed: 0,JobID,Duration,DurTime
0,1,20:08:32.12,0 days 20:08:32.120000
1,2,07:12:42.33,0 days 07:12:42.330000
2,3,09:33:12.98,0 days 09:33:12.980000
3,4,17:45:39.05,0 days 17:45:39.050000
4,5,21:19:11.67,0 days 21:19:11.670000


JobID                 int64
Duration             object
DurTime     timedelta64[ns]
dtype: object


---
**Example 2:** In worksheet **time_num** of **time_examples.xlsx**, the hour, minute and second are separated. We can combine them into a time duration using **pd.to_timedelta()**.
* **pd.to_timedelta(*Time*,unit=*code*)** will produce a Series of type **timedelta64**. ***Time*** is a numeric Series containing the duration in certain time unit. The unit is specified by the ***code***. 
* Here are some possible values for ***code***: "days","hours","minutes","seconds","milliseconds","microseconds","nanoseconds".



In [None]:
# Read time_examples.xlsx data file
# In the first worksheet "time_txt", Duration column is string in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/time_examples.xlsx",sheet_name="time_num")

# We will calculate the duration as seconds
data["Duration"] = pd.to_timedelta(data["Duration_Hour"]*3600+data["Duration_Min"]*60+data["Duration_Sec"],unit="seconds")

# Display the DataFrame
display(data)
print(data.dtypes)

Unnamed: 0,JobID,Duration_Hour,Duration_Min,Duration_Sec,Duration
0,1,45,45,23.45,1 days 21:45:23.450000
1,2,11,32,11.3,0 days 11:32:11.300000
2,3,56,11,55.1,2 days 08:11:55.100000
3,4,18,29,45.4,0 days 18:29:45.400000
4,5,9,38,7.88,0 days 09:38:07.880000


JobID                      int64
Duration_Hour              int64
Duration_Min               int64
Duration_Sec             float64
Duration         timedelta64[ns]
dtype: object
