# Exercise - Python Library - Data Cleanup

##### Problem Statement

Given a list of dictionaries, each representing a different role, where each dictionary contains a key `'job_skills'` with a string value representing a list of skills, and a `'job_date' represented also in a string value.
- Turn the `'job_date'` from a string value to a date time object.
- Turn the `'job_skills'` from a string value to an actual list object.


First we'll create a list called `data_science_jobs`, containing dictionaries. Each dictionary represents a data science job role:  
- `'job_title'`: The title of the job  
- `'job_skills'`: A string that represents a list of skills required  
- `'job_date'` : A string that represents a date  


In [6]:
data_science_jobs = [
    {'job_title': 'Data Scientist', 'job_skills': "['Python', 'SQL', 'Machine Learning']", 'job_date': '2023-05-12'},
    {'job_title': 'Machine Learning Engineer', 'job_skills': "['Python', 'TensorFlow', 'Deep Learning']", 'job_date': '2023-05-15'},
    {'job_title': 'Data Analyst', 'job_skills': "['SQL', 'R', 'Tableau']", 'job_date': '2023-05-10'},
    {'job_title': 'Business Intelligence Developer', 'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']", 'job_date': '2023-05-08'},
    {'job_title': 'Data Engineer', 'job_skills': "['Python', 'Spark', 'Hadoop']", 'job_date': '2023-05-18'},
    {'job_title': 'AI Specialist', 'job_skills': "['Python', 'PyTorch', 'AI Ethics']", 'job_date': '2023-05-20'}
]

If we print this variable we can see that the `'job_skills'` and `'job_date'` values are a string.

In [7]:
print(data_science_jobs[0])

{'job_title': 'Data Scientist', 'job_skills': "['Python', 'SQL', 'Machine Learning']", 'job_date': '2023-05-12'}


### `datetime` Conversion (`job_date`)

First let's import datetime in and test it.

In [8]:
from datetime import datetime

# show current date and time
datetime.now()

datetime.datetime(2024, 9, 16, 16, 51, 39, 888219)

Let's test one value from our `job_date` key.

In [9]:
test_date = data_science_jobs[0]['job_date']

print(test_date)
print(type(test_date))

2023-05-12
<class 'str'>


We'll now use the `.strptime()` method to convert it.

In [None]:
print(datetime.strptime(test_date, '%Y-%m-%d'))

2023-05-12 00:00:00


It works! Let's convert all them in the list.

In [None]:
data_science_jobs = [
    {'job_title': 'Data Scientist', 'job_skills': "['Python', 'SQL', 'Machine Learning']", 'job_date': '2023-05-12'},
    {'job_title': 'Machine Learning Engineer', 'job_skills': "['Python', 'TensorFlow', 'Deep Learning']", 'job_date': '2023-05-15'},
    {'job_title': 'Data Analyst', 'job_skills': "['SQL', 'R', 'Tableau']", 'job_date': '2023-05-10'},
    {'job_title': 'Business Intelligence Developer', 'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']", 'job_date': '2023-05-08'},
    {'job_title': 'Data Engineer', 'job_skills': "['Python', 'Spark', 'Hadoop']", 'job_date': '2023-05-18'},
    {'job_title': 'AI Specialist', 'job_skills': "['Python', 'PyTorch', 'AI Ethics']", 'job_date': '2023-05-20'}
]
from datetime import datetime

for job in data_science_jobs:
  job['job_date'] = datetime.strptime(job['job_date'], '%Y-%m-%d')

data_science_jobs


[{'job_title': 'Data Scientist',
  'job_skills': "['Python', 'SQL', 'Machine Learning']",
  'job_date': datetime.datetime(2023, 5, 12, 0, 0)},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': "['Python', 'TensorFlow', 'Deep Learning']",
  'job_date': datetime.datetime(2023, 5, 15, 0, 0)},
 {'job_title': 'Data Analyst',
  'job_skills': "['SQL', 'R', 'Tableau']",
  'job_date': datetime.datetime(2023, 5, 10, 0, 0)},
 {'job_title': 'Business Intelligence Developer',
  'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']",
  'job_date': datetime.datetime(2023, 5, 8, 0, 0)},
 {'job_title': 'Data Engineer',
  'job_skills': "['Python', 'Spark', 'Hadoop']",
  'job_date': datetime.datetime(2023, 5, 18, 0, 0)},
 {'job_title': 'AI Specialist',
  'job_skills': "['Python', 'PyTorch', 'AI Ethics']",
  'job_date': datetime.datetime(2023, 5, 20, 0, 0)}]

### `ast` Conversion (`job_skills`)

Now let's convert the skills to a list.

In [None]:
test_skills = data_science_jobs[0]['job_skills']

print(test_skills)
type(test_skills)

['Python', 'SQL', 'Machine Learning']


str

In [None]:
import ast

print(ast.literal_eval(test_skills))
type(ast.literal_eval(test_skills))

['Python', 'SQL', 'Machine Learning']


list

Nice let's do it for all now! (while doing the datetime).

In [13]:
data_science_jobs = [
    {'job_title': 'Data Scientist', 'job_skills': "['Python', 'SQL', 'Machine Learning']", 'job_date': '2023-05-12'},
    {'job_title': 'Machine Learning Engineer', 'job_skills': "['Python', 'TensorFlow', 'Deep Learning']", 'job_date': '2023-05-15'},
    {'job_title': 'Data Analyst', 'job_skills': "['SQL', 'R', 'Tableau']", 'job_date': '2023-05-10'},
    {'job_title': 'Business Intelligence Developer', 'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']", 'job_date': '2023-05-08'},
    {'job_title': 'Data Engineer', 'job_skills': "['Python', 'Spark', 'Hadoop']", 'job_date': '2023-05-18'},
    {'job_title': 'AI Specialist', 'job_skills': "['Python', 'PyTorch', 'AI Ethics']", 'job_date': '2023-05-20'}
]

import ast
from datetime import datetime

for job in data_science_jobs:
    job['job_date'] = datetime.strptime(job['job_date'], '%Y-%m-%d')
    job['job_skills'] = ast.literal_eval(job['job_skills'])

data_science_jobs

[{'job_title': 'Data Scientist',
  'job_skills': ['Python', 'SQL', 'Machine Learning'],
  'job_date': datetime.datetime(2023, 5, 12, 0, 0)},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': ['Python', 'TensorFlow', 'Deep Learning'],
  'job_date': datetime.datetime(2023, 5, 15, 0, 0)},
 {'job_title': 'Data Analyst',
  'job_skills': ['SQL', 'R', 'Tableau'],
  'job_date': datetime.datetime(2023, 5, 10, 0, 0)},
 {'job_title': 'Business Intelligence Developer',
  'job_skills': ['SQL', 'PowerBI', 'Data Warehousing'],
  'job_date': datetime.datetime(2023, 5, 8, 0, 0)},
 {'job_title': 'Data Engineer',
  'job_skills': ['Python', 'Spark', 'Hadoop'],
  'job_date': datetime.datetime(2023, 5, 18, 0, 0)},
 {'job_title': 'AI Specialist',
  'job_skills': ['Python', 'PyTorch', 'AI Ethics'],
  'job_date': datetime.datetime(2023, 5, 20, 0, 0)}]