# Updating the Date String to an Integer
The goal of this notebook is to convert the Job Posting Date to a number so that we can filter. We've discovered that this is being stored in Pinecone as a string and not a Date Object. 

String = "2024-02-27"
Int = Number of days since epoch

This notebook is used for exploratory purposes so that we further understand the structure of the data, prove the ability to convert a single record to the correct date format and then finally, we have a block of code that downloads the list of IDs in the namespaces so that the script ConvertDateStringToNumeric.py can do all the heavy lifting. 

In [37]:
import logging
import os
from dotenv import load_dotenv

from pinecone import Pinecone, ServerlessSpec

load_dotenv('.env')

index = "job-postings"
namespace = "full-posting-description"
pineconeApiKey = os.getenv("PINECONE_API_KEY")

In [38]:
pc = Pinecone(pineconeApiKey)
index = pc.Index(index)


print(index.describe_index_stats())

with open('full-posting-description.txt', 'w') as file:
    for ids_list in index.list(namespace=namespace):
        for id in ids_list:  # ids_list is a list of IDs
            # Write each ID to the file followed by a newline
            file.write(id + '\n')


{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'full-posting-description': {'vector_count': 27733},
                'full_posting_description': {'vector_count': 1},
                'job-description': {'vector_count': 1},
                'job-title': {'vector_count': 27732},
                'short-job-description': {'vector_count': 27730}},
 'total_vector_count': 83197}


# This next block contains the code that is required to modify the date string and convert to an integer

In [29]:
from datetime import datetime

def date_to_integer(date_str, epoch_str='1970-01-01'):
    date = datetime.strptime(date_str, '%Y-%m-%d')
    epoch = datetime.strptime(epoch_str, '%Y-%m-%d')
    delta = date - epoch
    return delta.days




In [30]:
# Example
job_posting_date = "2024-01-11"
date_numeric = date_to_integer(job_posting_date)
print(date_numeric)

19733


In [28]:
def get_date_range_for_last_n_days(days=45, base_date=None):
    if base_date is None:
        base_date = datetime.today()
    else:
        base_date = datetime.strptime(base_date, '%Y-%m-%d')

    epoch = datetime.strptime('1970-01-01', '%Y-%m-%d')
    end_date_numeric = (base_date - epoch).days
    start_date_numeric = end_date_numeric - days

    return start_date_numeric, end_date_numeric

# For today's date
start_date_numeric, end_date_numeric = get_date_range_for_last_n_days()
print(start_date_numeric, end_date_numeric)


19786 19831


### The block below here is intended as a short, "Hello World" to prove the code for converting a string date in to interger works as we expect it to. 

In [31]:
id = "0002a176-7601-462e-8e8d-c42d25fddaf8"


vectorFromPinecone = index.fetch(ids=[id], namespace=namespace)
jobPostingDateFromPineCone = vectorFromPinecone['vectors'][id]['metadata']['job_posting_date']

print(jobPostingDateFromPineCone)

numberOfDaysSinceEpoc = date_to_integer(jobPostingDateFromPineCone)

print(numberOfDaysSinceEpoc)


2024-01-08
19730
