### Exercise

#### Problem Description:
You are provided with a list of strings with some details of new students joining your college program, in the following format:

    [‘“P, Sudhakar” <sudhakar.p@frabjous.com> “28-MAR-2019”’, 
     ‘“Raghavendran, Sudip” <sragh@frabjous.com> “30/dec/2019”’,
     ‘“Mayer, Anjana” <a.mayer@frabjous.com> “02 January, 2020”’]

Process the input list to produce a list of tuples in the following format (after removing any duplicate tuples):

    [(‘Sudhakar’, ‘P’, ‘sudhakar.p@frabjous.com’, 28, 3, 2019), 
     (‘Sudip’, ‘Raghavendran’, ‘sragh@frabjous.com’, 30, 12, 2019),
     (‘Anjana’, ‘Mayer’, ‘a.mayer@frabjous.com’, 2, 1, 2020)]

Note that the input date formats may vary, but it will always be the day followed by the month in words or its abbreviation, followed by the year in full.

Write a procedural Python script using mappings (dictionary), sets and string methods without external modules (no imports).

#### Solution Algorithm:

0. new_list <- New Empty list
1. Loop over the given input list of strings, processing one string at a time:
2. &emsp;Extract first name by doing the following:
3. &emsp;&emsp;2nd_part   <- Spilt the string at ',' and take the 2nd part
4. &emsp;&emsp;first_name <- Split 2nd_part at '"' and take the 1st part
5. &emsp;Extract last name by doing the following:
6. &emsp;&emsp;1st_part  <- Spilt the string at ',' and take the 1st part
7. &emsp;&emsp;last_name <- Split 1st_part at '"' and take the 2nd part
8. &emsp;Extract email by doing the following:
9. &emsp;&emsp;2nd_part <- Spilt the string at '<' and take the 2nd part
10. &emsp;&emsp;email   <- Split 2nd_part at '>' and take the 1st part
11. &emsp;Extract date parts by doing the following:
12. &emsp;&emsp;2nd_part    <- Spilt the string at '>' and take the 2nd part
13. &emsp;&emsp;2nd_part    <- Spilt 2nd_part at '"' and take the 2nd part
14. &emsp;&emsp;date_string <- Convert 2nd_part to lower case
15. &emsp;&emsp;day_part    <- Slice first 2 characters from date_string
16. &emsp;&emsp;month_part  <- Slice 4th character to 5th last character from date_string
17. &emsp;&emsp;month_part  <- Remove ',' character if present from month_part
18. &emsp;&emsp;month_part  <- Convert to number using dictionary mapping
19. &emsp;&emsp;year_part   <- Slice last 4 characters from date_string
20. &emsp;Create a tuple from the extracted parts and append the tuple to new_list
21. To remove possible duplicates, convert new_list to a set and then convert it back to a list

In [1]:
raw_data = ['"P, Sudhakar" <sudhakar.p@frabjous.com> "28-MAR-2019"', 
            '"Mayer, Anjana" <a.mayer@frabjous.com> "02.jan.2020"', 
            '"Raghavendran, Sudip" <sragh@frabjous.com> "30/dec/2019"', 
            '"Mayer, Anjana" <a.mayer@frabjous.com> "02 January, 2020"']

In [2]:
month_mapping = {'jan': 1, 'january': 1,
                 'feb': 2, 'february': 2,
                 'mar': 3, 'march': 3,
                 'apr': 4, 'april': 4,
                 'may': 5, 
                 'jun': 6, 'june': 6,
                 'jul': 7, 'july': 7,
                 'aug': 8, 'august': 8,
                 'sep': 9, 'september': 9,
                 'oct': 10, 'october': 10,
                 'nov': 11, 'november': 11,
                 'dec': 12, 'december': 12}

In [3]:
def extract_first_name(student_details):
    
    first_name = student_details.split(',')[1]
    first_name = first_name.split('"')[0]
    
    return first_name.strip()

In [4]:
def extract_last_name(student_details):
    
    last_name = student_details.split(',')[0]
    last_name = last_name.split('"')[1]
    
    return last_name.strip()

In [5]:
def extract_email_address(student_details):
    
    email_address = student_details.split('<')[1]
    email_address = email_address.split('>')[0]
    
    return email_address.strip()

In [6]:
def extract_date_parts(student_details):
    
    date_str = student_details.split('>')[1]
    date_str = date_str.split('"')[1]
    date_str = date_str.strip().lower()
    
    day_part = int(date_str[:2])
    month_part = date_str[3:-5].strip()
    month_part = month_part.replace(',', '')
    month_part = month_mapping[month_part]
    year_part = int(date_str[-4:])
    
    return day_part, month_part, year_part

In [7]:
def extract_details(raw_data):
    
    processed_data = []

    for student_detail in raw_data:

        date_part_day, date_part_month, date_part_year = extract_date_parts(student_detail)

        processed_data.append((extract_first_name(student_detail), 
                               extract_last_name(student_detail), 
                               extract_email_address(student_detail), 
                               date_part_day, 
                               date_part_month, 
                               date_part_year))
        
    return processed_data

In [8]:
list(set(extract_details(raw_data))) # Using set to remove duplicates

[('Sudhakar', 'P', 'sudhakar.p@frabjous.com', 28, 3, 2019),
 ('Anjana', 'Mayer', 'a.mayer@frabjous.com', 2, 1, 2020),
 ('Sudip', 'Raghavendran', 'sragh@frabjous.com', 30, 12, 2019)]

### Introduction to Numpy

#### First, a demonstration of numeric manipulation using list

In [9]:
temperatures_past_week_in_c = [25.0, 23.8, 26.7, 22.2, 29.0] # in Celcius

#### Naive approach

In [10]:
temperatures_past_week_in_f = []
for c in temperatures_past_week_in_c:
    temperatures_past_week_in_f.append((c * 9/5) + 32)

#### Using list comprehension

In [11]:
temperatures_past_week_in_f = [((c * 9/5) + 32) for c in temperatures_past_week_in_c]

In [12]:
temperatures_past_week_in_f

[77.0, 74.84, 80.06, 71.96, 84.2]

#### Now, we look at how to do this in Numpy

In [13]:
import numpy

In [14]:
num_c = numpy.array(temperatures_past_week_in_c) # create ndarray from the list we already have

In [15]:
num_c

array([25. , 23.8, 26.7, 22.2, 29. ])

#### First, lets look at some of the ndarray properties

In [16]:
num_c.ndim # number of dimensions (also called axes)

1

In [17]:
num_c.shape # number of elements per axis

(5,)

In [18]:
num_c.size # total number of elements

5

In [19]:
num_c.dtype

dtype('float64')

In [20]:
num_c.itemsize # in bytes, per element

8

In [21]:
num_c.data # memory location of the first element

<memory at 0x7f71ded83ef0>

#### Vectorized code!

In [22]:
num_f = num_c * 9/5 + 32

In [23]:
num_f

array([77.  , 74.84, 80.06, 71.96, 84.2 ])

#### Slicing: remember that in Numpy, a slice referes to the original data and is not a copy

In [24]:
some_num_c = num_c[2:4]

In [25]:
some_num_c

array([26.7, 22.2])

In [26]:
some_num_c[1] += 10

In [27]:
some_num_c

array([26.7, 32.2])

In [28]:
num_c

array([25. , 23.8, 26.7, 32.2, 29. ])

#### Basic methods 

In [29]:
num_c.min()

23.8

In [30]:
num_c.max()

32.2

In [31]:
num_c.sum()

136.7

In [32]:
num_c.mean()

27.339999999999996

In [33]:
num_c.std()

2.993058636244871

#### A taste of multi-dimensional ndarrays

In [34]:
student_stats = numpy.array([(25.0, 90.0, 80.5), (28.0, 95.5, 85.0)])

In [35]:
student_stats

array([[25. , 90. , 80.5],
       [28. , 95.5, 85. ]])

In [36]:
student_stats.ndim

2

In [37]:
student_stats.shape

(2, 3)

In [38]:
student_stats.size

6

In [39]:
student_stats.dtype

dtype('float64')

#### Flatten a multi-dimensional array using ravel()

In [40]:
student_stats_flattened = student_stats.ravel()

In [41]:
student_stats_flattened

array([25. , 90. , 80.5, 28. , 95.5, 85. ])

#### Change the shape of an ndarray using reshape()

In [42]:
student_stats_reshaped = student_stats_flattened.reshape(3, 2)

In [43]:
student_stats_reshaped

array([[25. , 90. ],
       [80.5, 28. ],
       [95.5, 85. ]])