# Exploring Event Timing with Visualizations: Scatter Plot of Event Lag Time vs Date and Box Plot of Lag Time

In [469]:
# tools
from datetime import timedelta, datetime
import pandas as pd
import plotly.express as px
from csv import writer

Goal:
- Communicate events since last post

## Constructing Dataframe

In [451]:
# function to insert records
dates_and_labels_path = "dates_and_labels.csv"


def add_entry_csv(record, file_path):
    """
    Add an entry to a csv file.

    Parameters:
    record (list): A list of two items representing the entry to be added to the csv file. fmt [date,label]
    file_path (str): Path to the csv file.

    Returns:
    None
    """
    # checking record length
    assert len(record) == 2, "Record List contains more than 2 items"
    df_temp = pd.read_csv(file_path)
    if record[0] in df_temp["date"].tolist():
        assert (
            record[1] in df_temp[df_temp["date"] == record[1]]["label"].tolist()
        ), "This entry is an exact duplicate"
    with open(file_path, "a") as file:
        writer_obj = writer(file)
        writer_obj.writerow(record)

In [452]:
# add records here
enter_date = "January 18, 2023"
enter_label = "Competition Project"
add_entry_csv([enter_date, enter_label], file_path=dates_and_labels_path)

AssertionError: This entry is an exact duplicate

In [None]:
# constructing dataframe
df_main = pd.read_csv(
    dates_and_labels_path,
    parse_dates=["date"],
)
df_main.sort_values("date", inplace=True, ignore_index=True)

In [453]:
# functions to add columns: lag and time_left
def get_lag(df_dates):
    """
    Calculate time lag between consecutive dates in a pandas DataFrame.

    Parameters
    ----------
    df_dates : pandas DataFrame
        A pandas DataFrame containing at least a 'date' column.

    Returns
    -------
    None
        The function updates the input DataFrame with a new column 'lag'
        representing the time lag in days between consecutive dates in the
        'date' column.

    Example
    -------
    >>> import pandas as pd
    >>> df = pd.DataFrame({'date': ['2020-01-01', '2020-01-03', '2020-01-05']})
    >>> get_lag(df)
    >>> print(df)
         date  lag
    0  2020-01-01    0
    1  2020-01-03    2
    2  2020-01-05    2
    """
    for indx in range(df_dates.shape[0] - 1, 0, -1):
        val = (df_dates.at[indx, "date"] - df_dates.at[indx - 1, "date"]).days
        df_dates.at[indx, "lag"] = val
    df_dates.at[0, "lag"] = 0
    return print("lag column added")


def get_time_left(df_dates, end_date):
    """
    Calculates the number of days between each date in df_dates and end_date.

    Parameters
    ----------
    df_dates : pd.DataFrame
        DataFrame containing dates.
    end_date : datetime.datetime
        End date to calculate the difference between.

    Returns
    -------
    None
        Adds 'time_left' column to df_dates DataFrame.

    """

    for indx in range(0, df_dates.shape[0]):
        val = (end_date - df_dates.at[indx, "date"]).days
        df_dates.at[indx, "time_left"] = val
    return print("time_left column added")

In [454]:
# run to add: lag and time_left columns
end_date = pd.to_datetime("December 31 2023")
get_lag(df_main)
get_time_left(df_main, end_date)
display(df_main)

lag column added
time_left column added


Unnamed: 0,date,label,lag,time_left
0,2022-05-30,origin,0.0,580.0
1,2022-10-14,last_post,137.0,443.0
2,2022-11-11,ML dataiku cert.,28.0,415.0
3,2022-12-05,DS Associate cert.,24.0,391.0
4,2022-12-22,Portfolio Project,17.0,374.0
5,2023-01-01,Job Apps,10.0,364.0
6,2023-01-18,Competition Project,17.0,347.0
7,2023-01-23,ESL job,5.0,342.0
8,2023-01-25,Portfolio Project,2.0,340.0
9,2023-02-07,current_post,13.0,327.0


## Visuals

In [455]:
# mask if you want to see data from x onwards or x backwards
indx_fltr = df_main.label[df_main.label == "ML dataiku cert."].index[0]

In [464]:
fig = px.scatter(
    df_main[indx_fltr:],
    x="date",
    y="lag",
    height=500,
    width=800,
    text="label",
    title="Scatter Plot of Event Lag Time vs Date with Event Name as Hue",
)
fig.update_traces(textposition="top right")
fig.show()

Commentary:
- Since my last update on my journey pursuing DS/ML I have slowly drifted away from constantly learning and am now focused on application/interview competancy. 
- There are some background events that I did not list because I didnt record date of completion. e.g between my ML dataiku certification and Portfolio Project(Dec 22) I was learning/applying new things like dataiku platform to speed up ML process, got comfortable with Git, learned about NN and implementation using Keras,PyTorch. Moreover, overall I have done various mini projects to help me be more organized and doing DS/ML for fun. 

In [457]:
fig = px.box(
    df_main[indx_fltr:],
    y="lag",
    height=500,
    width=500,
    title="Time Elapsed Since Last Event: Scatter Plot of Lag",
)
fig.show()

Professional Career Accomplishments:

- Jan 23, 2023: Obtained a position as an ESL teacher in Korea.
- Nov 11, 2022: Received a Machine Learning Practitioner certificate from Dataiku.
- January 2023: Applied for multiple job opportunities.
- Jan 25, 2023: Kaggle Competition-Winner meta data analysis project
- Jan 18, 2023: Participated in a DataCamp Competition.
- Dec 22, 2022: Completed a BOW & NER pipeline project for job descriptions.
- Dec 5, 2022: Acquired a Data Science Associate certificate.

Currently, I am confident in utilizing advanced tools, techniques, and concepts, and am capable of quickly adapting to new technologies to achieve project goals.



In [475]:
print(
    "Time until I reach one year into my Journey:",
    ((df_main.date[0] + timedelta(days=365)) - datetime.now()).days,
    "days",
)

Time until I reach one year into my Journey: 111 days


## What now?


Future Plans:
>- Refine my portfolio to showcase accomplishments and projects
- Network with people in the industry to learn about job opportunities and make connections.
>- Prepare for technical interviews by reviewing algorithms, data structures, and coding challenges.
>- Continue gaining practical experience by working on real-world projects/competitions.
- Cotinue learning and updating skills to remain up-to-date with industry trends

This Project's github repository 🗿 -- https://github.com/c8garcia-T/Social-Media-Tech-Management

Connect 😁:
LinkedIn -- https://www.linkedin.com/in/8carlosgarcia

Portfolio 🎲:
Personal Website -- https://caringinsight.weebly.com/portfolio.html