# Dealing with Strings and Dates

## Strings

A string is a sequence of characters.

Strings in python are easy to work with. 


### String Creation

A string can be created by enclosing characters inside single quote or double quotes.

```python
str = 'This is a string'

str2 = "This is also a string"

```
Here are the commonly used string operations:


### Concatenation:

To concatenate two strings, "chair" and "table", you can use:

```python
furniture = "chair" + ", table"
```

### Other operations:

.lower(), .upper() to lower & upper case the strings

For example:
```python
str1 = 'A String example'
print('Uppercase:', str1.upper())
print('Lowercase', str1.lower())
print('Uppercase each word:', str1.title())

# Output:
>>> Uppercase: A STRING EXAMPLE
    Lowercase a string example
    Uppercase each word: A String Example
```

Ref: https://docs.python.org/3/library/string.html

### Regular Expressions:

A regular expression(regex) is a sequence of characters that define a pattern, which can help us to find or replace certain strings.

You can use 're' package to work with regular expressions. For example the following code can be used to find the position of the string "Mango".

```python
import re -- Importing re library
mangoes = re.search("Mango", "This is a Mango") -- Searching for the first occurance of string Mango
mangoes.group(0)
print (mangoes)
<_sre.SRE_Match object; span=(10, 15), match='Mango'> -- Printing the search results with 10 as the starting index of Mango and 15 as the ending index of the mango in the input string
```
The above example is to search for an exact match. If we want to do some fuzzy matches, the regular expression will be more powerful. The following are some frequently used symbols in regex:

<table style="width:100%">
<font size="20" face="Courier New" >
  <tr>
    <th colspan="2" style="text-align:center">Identifiers</th>
    <th colspan="2" style="text-align:center">Modifiers</th> 
    <th colspan="2" style="text-align:center">White space characters</th>
  </tr>
  <tr>
    <td>\d</td>
    <td>any number</td> 
    <td>{1,3}</td>
    <td>we're expecting 1-3</td>
    <td>\n</td>
    <td>new line</td>
  </tr>
  <tr>
    <td>\D</td>
    <td>anything but a number</td> 
    <td>\+</td>
    <td>Match 1 or more</td>
    <td>\s</td>
    <td>space</td>
  </tr>
  <tr>
    <td>\s</td>
    <td>space</td> 
    <td>?</td>
    <td>Match 0 or one</td>
    <td>\t</td>
    <td>tab</td>
  </tr>
  <tr>
    <td>\S</td>
    <td>anything but a space</td> 
    <td>\*</td>
    <td>Match 0 or more</td>
    <td>\e</td>
    <td>escape</td>
  </tr>
  <tr>
    <td>\w</td>
    <td>any character</td> 
    <td>$</td>
    <td>Match the end of a string</td>
    <td>\f</td>
    <td>form feed</td>
  </tr>
  <tr>
    <td>\W</td>
    <td>anything but a character</td> 
    <td>^</td>
    <td>Match the beginning of a string</td>
    <td>\r</td>
    <td>return</td>
  </tr>
  <tr>
    <td>.</td>
    <td>any character, except for a newline</td> 
    <td>|</td>
    <td>either or</td>
    <td></td>
    <td></td>
  </tr>
  <tr>
    <td>\b</td>
    <td>the whitespace around words</td> 
    <td>[]</td>
    <td>range of "variance" eg.[1-5a-qA-Z]</td>
    <td></td>
    <td></td>
  </tr>
  <tr>
    <td>\.</td>
    <td>a period</td> 
    <td>{x}</td>
    <td>expecting "x" amount</td>
    <td></td>
    <td></td>
  </tr>
</table>

A good website to find documentations and test your regex: https://regexr.com/

Ref: https://docs.python.org/3/library/re.html

### Exercise

* a) Given two strings a = "hello! How are you?" and b = "how can I help you? ", concatenate them into a new string c. Print c
* b) Use regular expressions to search for 'hel' in string c and assign it to variable hel_strings

In [None]:
a = "hello! How are you? "
b = "how can I help you?"

# regular expression


### Solution

```python

import re

c = a + b
print (c)

hel_strings = re.search("hel", c)
print(hel_strings.group(0))

```

After successful execution of this exercise, we encourage you to try to find the index of second 'hel' for fun and practice.

## Wild cards in Regular Expression

A period symbol is used to match any character.

```python
re.search(r'Bo.st.r', 'Booster').group()
```

In here, the string with the prefix letter 'r' means it is a raw string. When using regular expressions, remember to write 'r' as a prefix. Because Python will interpret some special characters differently, for example '\n':
```python
print('\nNew Line')     #print a blank line and then the two words "New Line"
print(r'\nNew Line')    #print every character in the string, "\nNew Line"

# Output:
>>> 
    New Line
    \nNew Line
```
Without 'r', Python will interpret '\n' as a new line. However, with 'r', Python will interpret '\n' as '\\' and 'n'. We want regular expressions to pass in literally before Python doing any interpretations.
Reference: https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-stringprefix

### Exercise

* a) In the string a = "hello! How are you?" use regular expressions with wild card for 2nd and 3rd characters to search for 'hello' and assign it to variable wildcard_search

In [None]:
# Follow the above pattern for wild card search
#write your code below


### Solution code

```python

re.search(r'h..lo', a).group()

```

## Numbers and repetitions

A \d is used to match numbers or digits. And a number of repetitions is matched using 
{x} where x is the number of repetitions.

```python
re.search(r'\d{3}-\d{3}-\d{4}', 'the phone number of John is 109-876-5432').group()

```

### Exercise

* a) In the phone number text contact = 'the phone number of John is 201.442.4536' use regular expressions to obain the phone number, assign it to variable phone_no and print it.

In [None]:
contact = 'the phone number of Jane is 201.442.4536'

#write your code below


### Solution code

```python

phone_no = re.search(r'\d{3}.\d{3}.\d{4}', contact).group()
print(phone_no)

```



## Customized Patterns

In the following string, I want to find all the names that are like 'Johnson'.
```python
str1 = "asdJiohn$onurbfaJohnsonuensjohnsoniuey123456"
import re
result = re.findall(r'[Jj]......n|[Jj].....n', str1) 
#In regular expression, [ ] can be used to define any characters that we want to find, for example, [Jj2] can find letter J, j or number 2.
result

# Output
>>> ['Jiohn$on', 'Johnson', 'johnson']
```
Before using regular expression, we need to inspect the original string to find useful patterns. 
Being able to extract useful information and then clean the data will be very important for a data scientist.

Below is part of a receipt, and we want to extract the UPC numbers.
```python
str1 = """WOODBURY COMMON 303 RED APPLE COURT
CENTRAL VALLEY, NY 10917 (845) 928-4465
SALE
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30’/. OFF CLEARANCE ($26.82) 20’/. OFF $200 ($12,52)
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30'/. OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MIN SRRA SAT-SV/NII 192643980211
F27591 SVNII 1 @ $298.00 SAVINGS:	($208,60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MIN SRRA SAT-SV/NII l92643980211
F27591 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
SUBTOTAL TAX - 8.1250%

ITEMCS) SOLD 5 ITEM(S) RETURNED 0

(231) 231 - 8675
873 876 2980
s876 237 3276
u,,"""

#Find upc numbers(every upc number contains 12 digits). Because the above string contains a upc number that starts with letter l instead of number 1, we need to use '\w' to find all the '12 characters' numbers.
import re
re.findall(r'\b\w{12}\b', str1)          
#In this example, '\b' represents whitespace around words

# Output
>>> ['192643979659',
 '192643979659',
 '192643979659',
 '192643980211',
 'l92643980211']


#Find phone numbers: start with '(' or without, 3 digits, followed by 1 or more non-digits, another 3 digits, followed by 1 or more non-digits, another 4 digits, followed by a whitespace.

re.findall(r'\(?\d{3}\D+\d{3}\D+\d{4}\b', str1)

# Output
>>> ['(845) 928-4465', '(231) 231 - 8675', '873 876 2980', '876 237 3276']
```

### Exercise
In the below exercise find all the style numbers. In the receipt, the style numbers contain 6 characters, starting with 1 letter or digit, followed by 5 digits.

In [None]:
str1 = """WOODBURY COMMON 303 RED APPLE COURT
CENTRAL VALLEY, NY 10917 (845) 928-4465
SALE
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30’/. OFF CLEARANCE ($26.82) 20’/. OFF $200 ($12,52)
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30'/. OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MINI BNT STCHL-SV/N 192643979659
F32202 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MIN SRRA SAT-SV/NII 192643980211
F27591 SVNII 1 @ $298.00 SAVINGS:	($208,60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
CRSGR MIN SRRA SAT-SV/NII l92643980211
F27591 SVNII 1 @ $298.00 SAVINGS:	($208.60)
GIFT ITEM
30% OFF CLEARANCE ($26.82) 20% OFF $200 ($12.52)
SUBTOTAL TAX - 8.1250%

ITEMCS) SOLD 5 ITEM(S) RETURNED 0

(231) 231 - 8675
873 876 2980
s876 237 3276
u,,"""

import re
#write your code below


### Solution code

```python

style_no = re.findall(r'\b\w\d{5}\b', str1)
print(style_no)

```


## Additional String Operations

### Accessing the characters

The individual characters of a string can be accessed by using indexing. Similarly a range of characters of a string can be accessed by using slicing. In python string indices start from 0.

```python
str1 = 'A String example'
print('str1 = ', str1)

# Output
>>>str1 =  A String example


#Print the first character
print('str1[0] = ', str1[0])

# Output
>>>str1[0] =  A

```


In order to access a range of characters a slice could be used.

```python
#To access the substring exam from str1, we could slice from 9 to 13th character (13 excluded) 
print('str1[9:13] = ', str1[9:13])

# Output
>>>str1[9:13] =  exam

```

Because we can use index to access substrings, this brings up another way to search, which is using a Python built-in function: find(), it can return the index of the first substring's first character. For example:
```python
str1 = 'A String example is an example'
str1.find('exam')

# Output
>>> 9

#If we define a starting index in find() function, we will get the index of the first substring's first character after that.
str1.find('exam', 10)

# Output
>>> 23
```

### Exercise

In the below exercise access the substring "Python" from the input string,  "Its awesome working with Python!" and assign it to variable str_python

In [None]:
str_input = 'Its awesome working with Python!'

#Add code to fetch the string "Python" from above input
#Use slicing of string


### Solution code

```python

str_python = str_input[25:31]
# We can use find() to search for the starting index, so this can also be written as the following:
# str_python = str_input[str_input.find('Python'):str_input.find('Python')+6]
print('str_python = ', str_python)

```

## Split a string

We can use re.split() to take strings apart. In the below example, we split str1 at each white-space character.
```python
import re
str1 = "A String example is an example"
result = re.split('\s', str1)
result1 = re.split('\s', str1, 2)  #Here we split str1 only at the first two occurrences.
print(result)
print(result1)

# Output:
>>> ['A', 'String', 'example', 'is', 'an', 'example']
    ['A', 'String', 'example is an example']
```
### Exercise
Given a string, split it at only the first '-' sign.

In [None]:
import re
str1 = "Area code:123-456-7890 is the number"
# write your code below


### Solution code
```python
re.split('-', str1, 1)
```


## Replace a String

String replace() method is used to replace a given substring within a string.

```python
str1 = 'firststring'
print('str1 = ', str1)

# Output
>>>str1 =  firststring

#Print the first character
str2 = str1.replace('first','second')
print('str2 = ', str2)

# Output
>>>str2 =  secondstring

```

### Exercise

In the below exercise in the substring  input_str, replace "Hi" with "Hello" and assign it to variable output_str

In [None]:
input_str = 'Hi There!'
#Replace Hi with Hello
input_str.replace('Hi', 'Hello')

### Solution code

```python

output_str = input_str.replace('Hi','Hello')
print('output_str = ', output_str)

```

### Join Two Strings

String join() method is used to a given character or string to every character in a string.
For example if you want to add a hyphen character between each character in a second string, you could use a join.

```python
str1 = '-'
print('str1 = ', str1)

#Print the first character
str2 = str1.join('word')
print('str2 = ', str2)
```

### Reverse a String

String reversed() method is used to reverse each character in a given string and return an iterator.

```python
string="12345"
print("".join(reversed(string)))
```


### Exercise

In the below exercise given two strings string1 and string2, join both strings by reversing string2 and assign it to variable string3. Print the resultant string3.

In [None]:
string1 = ','

string2 = 'abcde'
string3 = string1.join(reversed(string2))
print(string3)

### Solution code

```python

string3=string1.join(reversed(string2))
print(string3)

```

# Differentiating number from strings

In Data Science context, you often encounter a situation to deal with numeric values as opposed to strings. This task gets complicated due to the fact that the input comes from text files such as csv files.

In such cases, you need a mechanism to handle numbers separately from strings. Before you could do that, you need to sepatare them out using python constructs.

```python
a = "10"
b = "100.5"
try:
    intvalue = int(a)
except ValueError:
    # Oops, it wasn't an int, and that's fine
    pass
else:
    # It was an int, and now we have the int value
    outintvalue = intvalue/2

try:
    floatvalue = float(a)
except ValueError:
    # Oops, it wasn't an int, and that's fine
    pass
else:
    # It was an int, and now we have the int value
    outvalue = floatvalue*2


```

### Exercise

Given an input variable, "invar", write python code to output a variable "outvar". In case of string variable, output "String:input string". In case of integer input value, output 2*inputvar.

In [None]:
inputvar = "30"
try:
    intvalue = int(inputvar)
except ValueError:
    outvar = "string:" + inputvar
    pass
else:
    outvar = intvalue*2
print(outvar)


### Solution code

```python

try:
    intvalue = int(inputvar)
except ValueError:
    outvar = "String:" + inputvar
    pass

else:
    # It was an int, and now we have the int value
    outvar = intvalue*2
    
print(outvar)

```

## Date and Time operations

In python, the datetime module supplies classes for manipulating dates and times in both simple and complex ways

In datetime module there are classes such as date, time and datetime that provide a number of function to deal with dates, times and time intervals. Date and datetime are an object in Python. When you manipulate these objects you are actually working with the objects and not the string.

date – Manipulate just date ( Month, day, year) <br>
time – Time independent of the day (Hour, minute, second, microsecond)<br>
datetime – Combination of time and date (Month, day, year, hour, second, microsecond)<br>
timedelta— A duration of time used for manipulating dates<br>
tzinfo— An abstract class for dealing with time zones<br>


```python
from datetime import date
from datetime import time
from datetime import datetime
today = date.today()
now = datetime.now()
print(today)
print(now)

```

### Exercise

Write pyton code to import date,time and datetime from datetime library. Then assign variable today_now with the current date with timestamp. Hint: use datetime module.  

In [None]:
#write your code below
from datetime import date
from datetime import datetime
today_now = datetime.today()
today_now

### Solution code

```python

from datetime import time
from datetime import datetime
today_now = datetime.today()
print(today_now)

```


## Date operations

date.today() function has several properties associated with it. We can print individual day/month/year by using the day, month and year component.

For example:

```python
from datetime import date
today = date.today()

day = today.day
print("Day component is ", day)


month = today.month
print("Month component is ", month)


year = today.year
print("Year component is ", year)



```


## Time operations

datetime object has individual components representing hour, minute, seconds by using the properties.

For example:

```python

from datetime import datetime
today_now = datetime.today()
print(today_now)

hour = today_now.hour
print("Hour component is ", hour)


minute = today_now.minute
print("Minute component is ", minute)


seconds = today_now.second
print("Second component is ", seconds)


```
### Exercise

From the variable today_now extract day,month, year, hour, minute and seconds in respective variable names and print each of the six components. For example if timestamp is 2018-08-14 19:06:46.291600 then print : "Day Component is 14". Similarly print other 5 components mentioned above. 

In [None]:
#write your code below
datetime.today()

### Solution code

```python
from datetime import datetime
today_now = datetime.today()
print(today_now)

day = today_now.day
print("Day component is ", day)


month = today_now.month
print("Month component is ", month)


year = today_now.year
print("Year component is ", year)



hour = today_now.hour
print("Hour component is ", hour)


minute = today_now.minute
print("Minute component is ", minute)


seconds = today_now.second
print("Second component is ", seconds)

```

### Formatting DateTime Strings

In datetime module, the strptime method is used for parsing a formatted date string to a datetime object.
The string and the format it represents is provided as the parameters. See code example below.

```python
from datetime import datetime

datetime_obj = datetime.strptime('Jul 12 2018 7:24PM', '%b %d %Y %I:%M%p')

```

While representing the datetime format, here are the format string parts and what they represent.

%d	Day of the month as a zero-padded decimal <br>
%b	Month as locale’s abbreviated name. <br>
%Y	Year with century as a decimal number. <br>
%I	Hour (12-hour clock) as a zero-padded decimal number. <br>
%M	Minute as a zero-padded decimal number. <br>
%p	Locale’s equivalent of either AM or PM. <br>

Reference: For additional details refer to the link https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior

### Exercise

Take a string str_dttm = "Jul 14 2018 5:10AM" and parse it as a datetime object in to a variable parsed_dttm. Print parsed_dttm.

In [None]:
from datetime import datetime
str_dttm = "Jul 14 2018 5:10AM"

# write your code below
parsed_dttm = datetime.strptime(str_dttm, '%b %d %Y %I:%M%p')
parsed_dttm

### Solution code

```python


parsed_dttm = datetime.strptime(str_dttm, '%b %d %Y %I:%M%p')
print(parsed_dttm)
```

## Time delta

timedelta object represents a duration, the difference between two dates or times. This difference could be represented as years, months, days, hours, minutes, seconds or any combination there of. Look at the code example below, where one year difference is specified in different ways.

```python
from datetime import timedelta

#One year represented as 365 days
year = timedelta(days=365)

#One year represented as combination of weeks, days etc.
another_year = timedelta(weeks=40, days=84, hours=23, minutes=50, seconds=600)  # adds up to 365 days

#compare if both are same. The result is True
year == another_year

```

For example:
1. We can easily find what the date was three weeks ago:

```python
from datetime import datetime
from datetime import timedelta
init_date = datetime.strptime('2019 07 01', '%Y %m %d')
result_date = init_date + timedelta(weeks=-3)
print(result_date)

# Output
>>>2019-06-10 00:00:00
```
2. We have a certain date, and want to get the previous week's Monday date based on that date:

```python
# Get the given week's Monday date first:
from datetime import datetime
from datetime import timedelta
from datetime import date
init_date = datetime.strptime('2019 07 04', '%Y %m %d')
Monday_date = init_date + timedelta(days=-(init_date.weekday()))
# Then add the desired weeks or days, etc.
result_date = Monday_date + timedelta(weeks=-1)
print(result_date)

# Output
>>>2019-06-24 00:00:00
                              
```


### Exercise

a) In this exercise get time delta of exactly 200 days interms of weeks, days, hours, minutes and seconds and assign it to variable future_date using below criteria. Print future_date.

b) Given a person's birthday, calculate how old he is.

#### Criteria:<br>
Days must be between 1 to 7<br>
Hours must be between 1 to 23<br>
Minutes must be between 1 to 59<br>


In [None]:
from datetime import timedelta
from datetime import datetime
from datetime import date
dob = '09/10/1985'   #a person's birthday
# write your code below


### Solution code

```python

future_date = timedelta(weeks=28, days=3, hours=23, minutes=59, seconds=60)  # adds up to 200 days
print(future_date)

dob_date = datetime.strptime(dob, '%m/%d/%Y')
today = date.today()
delta = int((today.month, today.day) < (dob_date.month, dob_date.day))     #check if the person's birthday has passed or not in current year. If not passed, assign delta to 1.
result = today.year - dob_date.year - delta
print(result)
```