# Datetimes and Timezones

#### Dates:
* Create dates : `date(year, month, day)` ex: `date(2016, 6, 21)`
* Access date attributes: `weather_dates[0].year`, `weather_dates[1].month`, `weather_dates[2].day`
* Find weekday of a date (**weekdays are 0-indexed and week starts on Monday**): `weather_dates[3].weekday()`

```
early_hurricanes = 0
for hurricane in florida_hurricane_dates:
  if hurricane.month < 6:
    early_hurricanes = early_hurricanes + 1
print(early_hurricanes)
```
* Subtracting dates gives __`timedelta` objects__:
```
delta = date2 - date1
print(delta.days) #outputs 29
```
* Create a __`timedelta`__ object:
```
td = timedelta(days=29
print(date1 + td) #outputs '2017-12-04'
```
* We want to convert dates back into strings when, for example, we want to print results, put dates into a filename, or write dates out to a csv or excel file.
* Convert a `timedelta` into an integer with `.total_seconds()` method (or, for pd: `.dt.total_seconds()`).

#### Times:

* __ISO 8601 format:__ [international standard](https://en.wikipedia.org/wiki/ISO_8601) covering the worldwide exchange and communication of date- and time-related data.
    * YYYY-MM-DD 00:00:00
    * if we want the ISO representation of a date as a string (say, to write it to a CSV file): __`.isoformat()`__ method
    * ISO 8601 format ensures correct sorting: if we use ISO 8601 dates in *file names*, they will be correctly sorted from earliest to latest date (/datetime)
* __Every other format:__
    * `d.strftime()` : pass a format string
    * use `"%B"` to print full name of month ("AUGUST")(`"%Y-&B-%d"`)
    * use `%j` for day of year DDD ("364")(`%j (%Y)`)
    * full list of format strings [here](https://pynative.com/python-datetime-format-strftime/).
    * very flexible, see example:
    
```
d = date(2015, 1, 5)
print(d.strftime("Year is %Y") 
#outputs: Year is 2017
```

#### Datetimes:

* To create datetime representing "October 1, 2017, 3:23:25PM":
    * `dt = datetime(2017, 10, 1, 15, 25, 25)`
    * Note that hours are on 24 hour clock ("15")
    * all arguments must be whole numbers
    * if you wanted to add 0.5 seconds, you *can* add microseconds (and even nanoseconds) to your datetime: `dt = datetime(2017, 10, 1, 15, 25, 500000)`
* Create new datetimes from existing ones with `.replace()`:
    * `dt_hr = dt.replace(minute=0, second=0, microsecond=0)` 
    * outputs: `2017-10-01 15:00:00`

```
trip_counts = {'AM': 0, 'PM': 0}
for trip in onebike_datetimes:
  if trip['start'].hour < 12:    
    trip_counts['AM'] += 1
  else:    
    trip_counts['PM'] += 1  
```

#### Parsing

* `.strptime()` : short for "**str**ing **p**arse **time**"; parses time from a string, takes two arguments:
        1) string to turn into a datetime
        2) format string that indicates how (in what format) to do so
    * Note: need an exact match, otherwise: `ValueError: unconverted data remains: 15:19:13` 
    
```
s = '2017-02-03 00:00:01'
fmt = '%Y-%m-%d %H:%M:%S'
d = datetime.strptime(s, fmt)
#####

fmt = "%Y-%m-%d %H:%M:%S"
onebike_datetimes = []
for (start, end) in onebike_datetime_strings:
  trip = {'start': datetime.strptime(start, fmt),
          'end': datetime.strptime(end, fmt)}
  onebike_datetimes.append(trip)
```
* `.strftime()` : short for **str**ing **f**rom **time**; returns a string from a time, takes one argument:
    * format string that indicates how (in what format) to turn datetime into a string
    * example: 
    
    `first_start = onebike_datetimes[0]['start']` \
    `fmt= "%Y-%m-%dT%H:%M:%S"` \
    `print(first_start.strftime(fmt))`
        
#### Unix timestamp:
* Many computers store datetime information "behind the scenes" as the number of seconds since January 1st, 1970. This date is largely considered the birth of modern-style computers.
* This is especially common with computer infrastructure, like the log files that websites keep when they get visitors.
* The largest number that some older computers can hold in one variable is 2147483648, which as a Unix timestamp is in January 2038. On that day, many computers which haven't been upgraded will fail.
* to read a unix timestamp: `print(datetime.fromtimestamp(ts))`

#### Durations and timedeltas:
* Just as dates have "timedeltas," datetimes have "durations." Because datetimes have both date and time, "durations" are slightly more complicated.
* __duration:__ elapsed time between two events

```
duration = end - start
print(duration.total_seconds())
```
* __timedelta:__
```
delta1 = timedelta(days=1, seconds=1)
print(start + delta1)
```
* timedeltas can be created with any combination of: weeks, days, hours, minutes, seconds, microseconds-- as large as 2.7 million years
* can be negative

#### UTC offsets
* datetimes without timezones are referred to as "naive" and can't be compared across different parts the world.
* Because the UK was the first to standardize its time, everyone in the world sets their clock relative to the original, historical, UK standard. 
* __UTC:__ UK standard time; because all clocks are set relative to UTC, we can compare time around the world (UTC-x or UTC+x); for example New England is UTC-5hrs
* Create a timezone object, which accepts a timedelta that explains how to translate your datetime into UTC:

```
# US Eastern Standard time zone
ET = timezone(timedelta(hours=-5))
# Timezone-aware datetime (specify what time zone the clock was in when time was specified):
dt = datetime(2017, 12, 30, 15, 9, 3, tzinfo = ET)
# Now if you print it, your datetime contains includes your UTC offset
print(dt)
# outputs: '2017-12-30 15:09:03-5:00'
```
* Making the datetime aware of its timezone, means you can ask Python new questions (for example, what is this ET time in India Standard Time?) `print(dt.astimezone(IST))`
* *Important difference between adjusting timezones and changing tzinfo:*
* Changing tzinfo: `print(dt.replace(tzinfo=timezone.utc))`: the clock stays the same, but the UTC offset has been shifted only
* Adjusting timezones: `print(dt.astimezone(timezone.utc))`; changes **both** UTC offset and the clock itself

#### Timezone database
* Nearly impossible to know all UTC offsets when you need to align your data to a particular UTC, so instead: `tz` database
* `import dateutil` or `from dateutil import tz`
* __`tz` database:__
    * Format: 'Continent/City' ; `et = tz.gettz('America/New_York')`
    * Includes international and *historical* timezones
    * Updates automatically for daylight savings, etc
    * Full list of tz timezones [here](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)
```
uk = tz.gettz('Europe/London')
local = onebike_datetimes[0]['start']
notlocal = local.astimezone(uk)
####
ist = tz.gettz("Asia/Kolkata")
notlocal2 = local.astimezone(ist)
```

#### Daylight Savings
* to fix problems with comparing datetimes *by hand*, we start by creating timezone objects:
    * `EST = timezone(timedelta(hours=-5))` #setting timezone "by hand"; Eastern *Standard* Time
    * `EDT = timezone(timedelta(hours=-4))` #Eastern *Daylight* Time
* to know correct timezone at a given time of year, without having to look it up and set by hand (as above):
    * `dateutil` automatically identifies and factors in Daylight Savings Time (and/or other timezone updates)
    * `start = datetime(2017, 3, 12, tzinfo = tz.gettz('America/New_York'))`
```
# How many hours have elapsed? (in hours, regardless of daylight savings)
print((end - start).total_seconds()/(60*60))
####
# set as UTC before calculations to account for daylight savings:
print((end.astimezone(timezone.utc) - start.astimezone(timezone.utc)).total_seconds()/(60*60))
```
* wrinkle in time from fall back:
    * `tz.datetime_ambiguous(first_1am)`
    * outputs True/False
    
```
eastern = tz.gettz('US/Eastern')
first_1am = datetime(2017, 11, 5, 1, 0, 0, tzinfo = eastern)
tz.datetime_ambiguous(first_1am) # output = True
second_1am = datetime(2017, 11, 5, 1, 0, 0, tzinfo = eastern)
second_1am = tz.enfold(second_1am)
```
* `.enfold()` says "this datetime belongs to the second time the wall clock struck 1am on this day, and not the first
* enfold by itself doesn't change any behavior of a datetime `(first_1am - second_1am).total_seconds()` = 0.0
* **still need to convert to UTC, which is unambiguous**

``` 
first_1am = first_1am.astimezone(tz.UTC)
second_1am - second_1am.astimeszone(tz.UTC)
```
* Now, `(first_1am - second_1am).total_seconds()` = 3600.0
* To **loop for possibly ambiguous times**:

```
# Loop over trips
for trip in onebike_datetimes:
  # Rides with ambiguous start
  if tz.datetime_ambiguous(trip['start']):
    print("Ambiguous start at " + str(trip['start']))
  # Rides with ambiguous end
  if tz.datetime_ambiguous(trip['end']):
    print("Ambiguous end at " + str(trip['end']))
```
* To **enfold and convert ambiguous datetimes to UTC with a for loop**:

```
for trip in onebike_datetimes:
  if trip['start'] > trip['end']:
    trip['end'] = tz.enfold(trip['end'])
  start = trip['start'].astimezone(tz.UTC)
  end = trip['end'].astimezone(tz.UTC)
```

#### Datetimes in pandas
* to read in certain columns as datetimes:
    * `rides = pd.read_csv('capital-onebike.csv', parse_dates= ['Start date', 'End date])`
* pass to `parse_dates` a list of column names, passed as individual strings
* pandas tries to intuitively figure out the datetime format intended, but in case this doesn't work, change format manually:
    * `rides['Start date'] = pd.to_datetime(rides['Start date'], format = "%Y-%m-%d %H:%M:%S")`
* for essentially all purposes a pandas Timestamp object is the same as a Python datetime object (same behavior)
* within pandas, access all of the typical datetime methods, within the namespace `.dt`
    * `rides['Duration'].dt.total_seconds.head(5)` #converts duration (timedeltas) to seconds
* `.value_counts()`: returns how many times a given value appears
* `.groupby()`: group by values in any column; groupby takes a column name and does all subsequent options on each group. Groupby is not limited to daatetime-specific values. For example: group by member type and return mean duration in seconds of bike trips per each member type:
    * `rides.groupby('Member type')['Duration seconds'].mean()`
* `.resample()`: takes a unit of time, (for example 'M' for month) and a datetime column to group on; specifically, `.resample()` groups rows by some time or date information.
    * group rides by time 
    * `rides.resample('M', on = 'Start date')['Duration seconds'].mean()
* Other common groupby combos for datetimes:
    * `.groupby().size()`
    * `.groupby().first()`
* `.groupby()` and `.resample()` together:
    * `grouped = rides.groupby('Member type').resample('M', on='Start date')`
* Just as in Python, **in pandas, all datetimes start off as timezone naive** (not tied to any absolute time with a UTC offset).
* **put into a timezone in pandas:**
    * `.dt.tz_localize('America/New_York')
* **pandas AmbiguousTimeError**: if you try to convert a datetime (or a column of datetimes) that occur during Daylight Savings Time. Alternatively, reset/fill ambiguous times:
    * `dt.tz_localize('America/New_York', ambiguous = 'NaT')` 
    * pandas will skip over NaTs when it sees them: allows you to use .min, .sum, etc functions without error
    * `dt.tz_convert`: converts to a new timezone, whereas dt.tzlocalize() sets a timezone in the first place. You now know how to deal with datetimes in Pandas.
* other pandas datetime operations:
    * `.dt.weekday_name()`
    * `.shift()` #shift rows up or down: 
        * `rides['End date'].shift(1).head(3)`
        * useful, for example, if you want to line up the end times with the start time of next ride
        * `#Calculate the difference in the Start date of the current row and the End date of the previous row`
        * `# Shift the index of the end date up one; now subract it from the start date 
        * `rides['Time since'] = rides['Start date'] - (rides['End date'].shift(1))`