# Freedmen's Bureau Analysis 

## Task #0: Configure an LLM

### Import libraries that we'll use in this notebook

In [52]:
from langchain_community.llms import Ollama
from langchain_community.chat_models import ChatOllama
from langchain_openai import OpenAI, ChatOpenAI
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.chains import LLMChain
from langchain.schema import HumanMessage, SystemMessage, AIMessage

import json
import pandas as pd
import ast
import textwrap


## Task #1: Parse Entitities from Simple Examples (Warmup)

### Samples to be able to parse correctly

In [53]:

inputs = [
    "John James agrees to pay $50/month to RJ Hampshire for work on the Farm",
    "Elizabeth James will pay $30 per month to Levi Rodgers for Gardening",
    "Johnson Ollaman will pay $1.25 per day to both John Smith and Jane Smith for teaching the children of the community",
    "Claire Daniels charges $50 weekly to local community members for cooking classes, emphasizing the joy of healthy eating.",
    "Marcus Wellby commits to donating $500 annually to the Green Earth Foundation for environmental conservation efforts.",
    "Dr. Helena Russell charges $100 per hour for providing guidance and support to medical students, aiming to enhance their clinical skills and knowledge.",
    "Keith Galli charges $0 to watch his YouTube content; the least you could do is smash that like button and subscribe, hehehe xD",
    "The local sports club agrees to pay $75 each to coaches Sarah Miller, Danny Glover, Alex Reed, and Jamie Fox for conducting a weekend sports clinic.",
    """This Agreement made this 14th day of August A.D. 1865, by and between F.R.J. Terry of the county of Copiah and State of Mississippi of the first part, and the person hereinafter named and undersigned, Freedmen of the second part [[?]] That for the purpose of working in the [[?]] known as Beagley's [[?]] Yard in the county aforesaid for two months commencing on the 14th day of August 1865 and terminating on the 14th day of October 1865. The said F.R.J. Terry party of the first part, in consideration of the [[?]] and conditions hereinafter mentioned on the part of the party of the second part agrees to pay said laborer "10" ten dollars per month and furnish free of charge clothing and good of good quality and sufficient quantity, good and sufficient quarters, and kind and humane treatment. And it is further agreed that in case the said F.R.J. Terry shall fail, neglect, or refuse to fulfill any of the obligations assumed by him, he shall besides the legal recourse left to the party aggrieved render this contract liable to amendment by the Provost Marshal of Freedmen. And it is agreed on the part of the party of second part that he will well and faithfully perform such labor as the said F.R.J. Terry may require of him for the time aforesaid, nor exceeding ten hours per day in summer and nine hours in winter. And in case the said laborer shall absent himself from or refuse to perform the labor herein promised, he shall loose the time and be punished as such manner as the Provost Marshal shall deem propper.""",
]

outputs = [
    [{"payer": "John James", "recipient": "RJ Hampshire", "amount": 50, "pay frequency": "monthly", "description": "farming"}],
    [{"payer": "Elizabeth James", "recipient": "Levi Rodgers", "amount": 30, "pay frequency": "monthly", "description": "gardening"}],
    [{"payer": "Johnson Ollaman", "recipient": "John Smith", "amount": 1.25, "pay frequency": "daily", "description": "teaching the children of the community"}, {"payer": "Johnson Ollaman", "recipient": "Jane Smith", "amount": 1.25, "pay frequency": "daily", "description": "teaching the children of the community"}],
    [{"payer": "Claire Daniels", "recipient": "Local community members", "amount": 50, "pay frequency": "weekly", "description": "cooking classes"}],
    [{"payer": "Marcus Wellby", "recipient": "Green Earth Foundation", "amount": 500, "pay frequency": "yearly", "description": "donation for environmental conservation"}],
    [{"payer": "Dr. Helena Russell", "recipient": "Medical students", "amount": 100, "pay frequency": "hourly", "description": "mentorship and clinical skill enhancement"}],
    [{"payer": None, "recipient": "Keith Galli", "amount": 0, "pay frequency": None, "description": "YouTube content"}],
    [{"payer": "The local sports club", "recipient": "Sarah Miller", "amount": 75, "pay frequency": "one-time", "description": "weekend sports clinic"},
     {"payer": "The local sports club", "recipient": "Danny Glover", "amount": 75, "pay frequency": "one-time", "description": "weekend sports clinic"},
     {"payer": "The local sports club", "recipient": "Alex Reed", "amount": 75, "pay frequency": "one-time", "description": "weekend sports clinic"},
     {"payer": "The local sports club", "recipient": "Jamie Fox", "amount": 75, "pay frequency": "one-time", "description": "weekend sports clinic"}],
    [{"payer": "F.R.J. Terry", "payee": "Freedmen", "amount":10, "pay frequency": "monthly", "description": "working in the yard"}],
]

### Your code here

In [54]:
system_message = textwrap.dedent("""
  From the following historical document text, please grab out the following items and return with the following format JSON
  {{ "results":
  [{{
    "payer": "<Payer Name>",
    "recipient": "<Payee Name>",
    "amount": <Amount Paid>, # This is the amount of money in USD
    "pay frequency": "<Frequency Paid>" # Only options are "hourly, "daily", "monthly", "weekly", "yearly", "one-time", or "other"
    "description": "<Description of Work>" # This is the description of the work being done
  }},
  {{<ITEM 2>}}, {{Item 3>}}, ...]
  }}
  So for the following text:
  "Jane Doe will pay $50/month to John Smith for work on the Farm"

  the output text generated would be:
  {{ "results":
  [{{
    "payer": "Jane Doe",
    "recipient": "John Smith",
    "amount": 50,
    "pay frequency": "monthly"
    "description": "work on the farm"
  }}]
  }}

  IMPORTANT: Return only the dictionary mapping results to a list of Python dictionary objects and nothing more. It is possible there are no matches, in which case return results field being an empty list.
                                 
  If there are multiple matches, list the full JSON details for each as an item in the results array
  """
)

In [55]:
# chat = ChatOpenAI(model_name="gpt-4-1106-preview", response_format={"type":"json_object"})
chat = ChatOpenAI()
# chat = ChatOllama(model="llama2", format="json")

def get_output(input):
    messages = [SystemMessage(content=system_message), HumanMessage(content=inputs[0]), AIMessage(content=str(outputs[0])), HumanMessage(content=input)]
    output = chat.invoke(messages)
    return output.content

def parse_output(output):
    print(output)
    return json.loads(output)

print(inputs[1])

test = get_output(inputs[1])
answer = parse_output(test)

answer["results"]





Elizabeth James will pay $30 per month to Levi Rodgers for Gardening
{
"results":[
{
  "payer": "John James",
  "recipient": "RJ Hampshire",
  "amount": 50,
  "pay frequency": "monthly",
  "description": "farming"
},
{
  "payer": "Elizabeth James",
  "recipient": "Levi Rodgers",
  "amount": 30,
  "pay frequency": "monthly",
  "description": "Gardening"
}
]
}


[{'payer': 'John James',
  'recipient': 'RJ Hampshire',
  'amount': 50,
  'pay frequency': 'monthly',
  'description': 'farming'},
 {'payer': 'Elizabeth James',
  'recipient': 'Levi Rodgers',
  'amount': 30,
  'pay frequency': 'monthly',
  'description': 'Gardening'}]

## Task #2a: Grab Labor Contract Rows from Kaggle

In [56]:
import pandas as pd

df = pd.read_csv('./input/contract-records.csv')
df['transcription_text'] = df['transcription_text'].str.replace('_x000D_',' ')

aa = df[df['sub_category'] == 'Apprenticeship Agreement']
aa.head()

aa.to_csv('./input/apprenticeship-agreements-test.csv', index=False)
# apprenticeships = df[df['sub_category']=='Labor Contracts']
# labor.head()

# pd.set_option('display.max_colwidth', None)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

# # pd.reset_option('display.max_colwidth')
# # pd.reset_option('display.max_columns')
# # pd.reset_option('display.max_rows')



# labor.head()


## Task #2: Connect pages that belong to the same document

In [57]:
aa = pd.read_csv('./input/apprenticeship-sample.csv')
aa.head()

Unnamed: 0,project_id,category,sub_category,transcription_text,document_url,expected_id
0,15884,Contracts,Apprenticeship Agreement,Apprenticeship \n \nJames Conoly \n \nMary 12 years' \n \nSeptbr. 4. 1865 \n,https://transcription.si.edu/transcribe/15884/NMAAHC-004567419_00033,0
1,15884,Contracts,Apprenticeship Agreement,"Indentures of Apprenticeship. \nState of North Carolina } \nRobeson Co } \n \nThis Indenture made the 4 day of Sep 1865 between O.B. Todd Lieut and Act Super Bureau Refugees Freedman, &c. for Robeson County of the one part and James Conoly of the other part Witnesseth That the said [[strikethrough]] James Conoly [[/strikethrough]] [[insertion]] O.B. Todd Lieut and Asst Supt [[/insertion]] Bureau Refug Freedman &c doth put place and bind unto the said James Conoly one orphant named Mary age 12 year after the the manner of an apprentice and servent, untill the said apprentice shall attain the age of twenty one years, during all which time the said apprentice his master faithfully shall serve, and his lawfull commands every where obey. And the said James Conoly doth covenant and promise and agree, that he will teach and instruct the said apprentice, or cause him to be instructed, to read and wright and that he will constantly find and provide for said apprentice, during the term aforesaid, sufficient diet, washing, lodging and apparal, fitting for an apprentice; and also all other things necessary both in sickness and helth. \n \nIn witness whereof, the parties to these presents have set there hand and seal the day and year above written \n \nJames Conoly \nO.B. Todd Lieut and Asst Supt",https://transcription.si.edu/transcribe/15884/NMAAHC-004567419_00034,0
2,15884,Contracts,Apprenticeship Agreement,Indenture of Apprenticeship \n \nSept 14th 1865 \n \nNeil & Ellen \nto \nJames McCallum,https://transcription.si.edu/transcribe/15884/NMAAHC-004567419_00035,1
3,15884,Contracts,Apprenticeship Agreement,"Indenture of Apprenticeship \nState of North Carolina Robeson County \n \nThis Indenture made the fourteenth day of Septr A.D. 1865 between James Sinclair Agent of Bureau of Refugees, Freedmen & Abandoned lands and Therefore legal guardian of Colored orphan children of the one part, and James McCallum Planter of the above County and State, of the other part. Witnesseth that the said James Sinclair agent doth put, place and bind unto the said James McCallum the orphans named Neill & Ellen aged who were formerly the property of the mother in law of said James McCallum, but were raised by him in his own family, to live after the manner of Apprentices and Servants until said Apprentices shall attain the age of twenty one years; during all which time, the said Apprentices their Masters faithfully shall serve, and his lawful Commands everywhere obey. And the said James McCallum doth covenant and promise and agree that he will teach and instruct the said Apprentices or cause them to be taught and instructed to read & write; and that he will constantly find and provide for said Apprentices during the time aforesaid, sufficient diets, washing lodging and apparel, fitting for an Apprentice and also all other things necessary both in sickness and health. \n \nIn witness whereof, the parties to these present have set their hands and seals, the day and year above written \n \nJames Sinclair {seal} \nAgt of Bureau \nJames McCallum {seal} \nMaster",https://transcription.si.edu/transcribe/15884/NMAAHC-004567419_00036,1
4,15884,Contracts,Apprenticeship Agreement,"Indenture of Apprenticeship \n \nSept 14th 1865 \n \norphan Jane, 15 years \nArchibald McMillan \n",https://transcription.si.edu/transcribe/15884/NMAAHC-004567419_00037,2


In [58]:
import ast

def from_same_document(document1, document2):
    system_message = """
    Your job is to determine whether the following two documents are from the same document or not. Please return a boolean value of True or False. 
    One of the key things to look out for is the same names, locations, or dates referred to in both documents.

    Document 1:
    {document1}

    Document 2:
    {document2}

    ---
    IMPORTANT: Make sure to only return the boolean value and nothing else
    """
    
    # Fail fast if the documents are not from the same project
    if document1['project_id'] != document2['project_id']:
        return False
    
    prompt = PromptTemplate(input_variables=['document1', 'document2'], template=system_message)
    full_prompt = prompt.format(document1=document1['transcription_text'], document2=document2['transcription_text'])

    response_raw = chat.predict(full_prompt)

    # print(response_raw.strip())

    return ast.literal_eval(response_raw.strip())

In [59]:
import time

def group_documents_in_df(df):
    merge_id = 0
    for i in range(0, len(df)):
        # Access the row index instead of the row itself
        time.sleep(0.25)
        if i % 100 == 0: 
            print(f"Processed {i} rows")
        index1 = df.index[i]
        index2 = df.index[i+1] if i + 1 < len(df) else None

        # Use loc to access the rows
        document1 = df.loc[index1]
        document2 = df.loc[index2] if index2 is not None else None

        if document2 is not None:
            if from_same_document(document1, document2):
                # Set the value using loc
                df.loc[index1, 'merge_id'] = int(merge_id)
                df.loc[index2, 'merge_id'] = int(merge_id)
                continue
            
        df.loc[index1, 'merge_id'] = int(merge_id)
        merge_id += 1

    return df

In [60]:
aa = pd.read_csv('./input/apprenticeship-agreements-test.csv')
aa.head()

Unnamed: 0,project_id,category,sub_category,transcription_text,document_url
0,11406,Contracts,Apprenticeship Agreement,Copy \n \nAgreement of Apprenticeship \nBu \nMrs. Kate V.Chamblin \nof the first part and \n \n1st Lt. Geo. W. Rollins \nV.R.C. U.S. Army Agt \nfor Carroll Parish La. of \nthe second part \n \nMinor Orphans Freed \nJames Samenett \nBetty Taylor \nMary Taylor \n \nLake Providence \nCarroll Parish La \nDated Oct. 1 1866 \nExec'd Nov. 3 1866 \n,https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00439
1,11406,Contracts,Apprenticeship Agreement,"[H 132 ENCL] \n \nCopy \n \nAgreement of Apprenticeship. \n \nThis agreement in two parts made & entered into this 1"" day of October A. D. 1866 by & between Mrs Kate V. Chambliss of the first part and 1st Lieut Geo. W. Rollins Vet Res. Corps U.S.A. Agent for Carroll Parish La. Bur of Ref. Freedn & Aband Lands La and by virtue of authority contained in Circular No 25 dated Hd. Qrs. Bur Ref Freedn & Abd Lands New Orleans Louisiana Octr 31"" 1865 Guardian for Minors & orphans of Freedmen for Carroll Parish La party hereto of the second part. Witnesseth That James Samenett, Betty Taylor and Mary Taylor minor orphans of African decent are hereby bound & apprenticed to service to the said Mrs Kate V. Chambliss party of the first part & undersigned during their years of minority commencing & ending as follows: James Samenett aged 14 years, commencing on the 1"" day of Oct. A. D. 1866 & terminating on the 1"" day of October 1870. Betty Taylor aged 9 years commencing on the 1"" day of Oct. A. D. 1866 & ending on the 1"" day of Oct. A. D. 1872. Mary Taylor aged 7 years commencing on the 1"" day of October A. D. 1866 & terminating on the 1"" day of October A. D. 1874. And it",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00440
2,11406,Contracts,Apprenticeship Agreement,"is agreed on the part of the party of the first part & undersigned Mrs Kate V. Chambliss that for the consideration of the faithfull services to be rendered by the within named & said minor orphans the said minor orphans shall receive comfortable clothing, board medical treatment when sick, a reasonable amount of schooling and permission to attend church each Sabbath and at the end or termination of their term of apprenticeship the said minor orphans shall be allowed to retain all articles of their personal apparel. \n \nThe said parties do hereby mutually agree that all laws or parts of laws enacted or that may be enacted by the United States or the State of Louisiana establishing laws for the welfare or government of Freedmens minors or orphans or for the government of the same under laws establishing the Bur of Ref Freed & Ab^d Lands or in any way affecting the provisions of this agreement is & shall be made a part of this agreement and that all laws applying to white minors or orphans shall be extended to the said & within named minor orphans during the conti-",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00441
3,11406,Contracts,Apprenticeship Agreement,"continuance of this agreement & finally that this agreement shall expire & terminate when the aforesaid minor male orphan shall have arrived at the age of eighteen years and the minor female orphans shall have arrived at the age of fifteen years respectively. \n \nIn testimony whereof the said parties have hereunto affixed their names to this agreement. Done at Lake Providence Louisiana Parish of Carroll on the third day of November A. D. 1866. \n \nsigned Kate V. Chambliss \nsigned Geo. W Rollins \n1st Lieut. VRC. U.S. Army \nAgent for Carrol Parish \nLa. Bureau of R. F. & A Lands \nLouisiana \n \nExecuted in Presence of \nSigned Ben C. Johnson \nJackson Chambliss \nJohn A. Ginst [[?]] \n \n[[image - three boxes oriented vertically representing Internal Revenue Stamps that appeared on original document, reading as follows: 2¢ / Int Rev / [[illegible]]; a single illegible initial appears beneath each ""stamp""]] \n \nI certify the above to be a true copy of the original Indenture in possession of Mrs Kate V. Chambliss. \n \nTho H. Hay \n1st Lieut 42d Infty \nA S. A. Comr",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00442
4,15369,Contracts,Apprenticeship Agreement,"[[preprinted]] \nBureau of Refugees, Freedman and Abandoned Lands, \nHead-Quarters, Asst. Commissioner, State of North Carolina \nRaleigh, N.C., ^[[March 2""]] 186^[[6]] [[/preprinted]] \n \nJim Tew (Colored) having made statements that he and his brother Charley & Joe have been bound by one Richd Holmes of Sampson County to one Lewis Tew. - Know all by these that the said Richd Holmes has no authority for binding or apprenticing minors, and that all indentures that may have been made by him are null and void. the aforesaid Jim Tew and his brother Charley and Joe are free to engage in any contract or to work with such as may wish to hire them. \n \nAll officers of the Bureau, Capts of Police or others are requested to see that these men are protected in their rights. \n \nBy order of Col. Whittlesey, Asst. Commissioner \n \nLieut. and A. A. A. Genl.",https://transcription.si.edu/transcribe/15369/NMAAHC-004567415_00230


In [51]:
group_documents_in_df(aa)

Processed 0 rows


KeyboardInterrupt: 

In [None]:
import pandas as pd
aa.head()

# Set the display option to show full column value
pd.set_option('display.max_colwidth', None)

# Call aa.head() to display the dataframe with full column value
aa.head()


Unnamed: 0,project_id,category,sub_category,transcription_text,document_url,merge_id
0,11406,Contracts,Apprenticeship Agreement,Copy \n \nAgreement of Apprenticeship \nBu \nMrs. Kate V.Chamblin \nof the first part and \n \n1st Lt. Geo. W. Rollins \nV.R.C. U.S. Army Agt \nfor Carroll Parish La. of \nthe second part \n \nMinor Orphans Freed \nJames Samenett \nBetty Taylor \nMary Taylor \n \nLake Providence \nCarroll Parish La \nDated Oct. 1 1866 \nExec'd Nov. 3 1866 \n,https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00439,0.0
1,11406,Contracts,Apprenticeship Agreement,"[H 132 ENCL] \n \nCopy \n \nAgreement of Apprenticeship. \n \nThis agreement in two parts made & entered into this 1"" day of October A. D. 1866 by & between Mrs Kate V. Chambliss of the first part and 1st Lieut Geo. W. Rollins Vet Res. Corps U.S.A. Agent for Carroll Parish La. Bur of Ref. Freedn & Aband Lands La and by virtue of authority contained in Circular No 25 dated Hd. Qrs. Bur Ref Freedn & Abd Lands New Orleans Louisiana Octr 31"" 1865 Guardian for Minors & orphans of Freedmen for Carroll Parish La party hereto of the second part. Witnesseth That James Samenett, Betty Taylor and Mary Taylor minor orphans of African decent are hereby bound & apprenticed to service to the said Mrs Kate V. Chambliss party of the first part & undersigned during their years of minority commencing & ending as follows: James Samenett aged 14 years, commencing on the 1"" day of Oct. A. D. 1866 & terminating on the 1"" day of October 1870. Betty Taylor aged 9 years commencing on the 1"" day of Oct. A. D. 1866 & ending on the 1"" day of Oct. A. D. 1872. Mary Taylor aged 7 years commencing on the 1"" day of October A. D. 1866 & terminating on the 1"" day of October A. D. 1874. And it",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00440,0.0
2,11406,Contracts,Apprenticeship Agreement,"is agreed on the part of the party of the first part & undersigned Mrs Kate V. Chambliss that for the consideration of the faithfull services to be rendered by the within named & said minor orphans the said minor orphans shall receive comfortable clothing, board medical treatment when sick, a reasonable amount of schooling and permission to attend church each Sabbath and at the end or termination of their term of apprenticeship the said minor orphans shall be allowed to retain all articles of their personal apparel. \n \nThe said parties do hereby mutually agree that all laws or parts of laws enacted or that may be enacted by the United States or the State of Louisiana establishing laws for the welfare or government of Freedmens minors or orphans or for the government of the same under laws establishing the Bur of Ref Freed & Ab^d Lands or in any way affecting the provisions of this agreement is & shall be made a part of this agreement and that all laws applying to white minors or orphans shall be extended to the said & within named minor orphans during the conti-",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00441,0.0
3,11406,Contracts,Apprenticeship Agreement,"continuance of this agreement & finally that this agreement shall expire & terminate when the aforesaid minor male orphan shall have arrived at the age of eighteen years and the minor female orphans shall have arrived at the age of fifteen years respectively. \n \nIn testimony whereof the said parties have hereunto affixed their names to this agreement. Done at Lake Providence Louisiana Parish of Carroll on the third day of November A. D. 1866. \n \nsigned Kate V. Chambliss \nsigned Geo. W Rollins \n1st Lieut. VRC. U.S. Army \nAgent for Carrol Parish \nLa. Bureau of R. F. & A Lands \nLouisiana \n \nExecuted in Presence of \nSigned Ben C. Johnson \nJackson Chambliss \nJohn A. Ginst [[?]] \n \n[[image - three boxes oriented vertically representing Internal Revenue Stamps that appeared on original document, reading as follows: 2¢ / Int Rev / [[illegible]]; a single illegible initial appears beneath each ""stamp""]] \n \nI certify the above to be a true copy of the original Indenture in possession of Mrs Kate V. Chambliss. \n \nTho H. Hay \n1st Lieut 42d Infty \nA S. A. Comr",https://transcription.si.edu/transcribe/11406/NMAAHC-004567395_00442,0.0
4,15369,Contracts,Apprenticeship Agreement,"[[preprinted]] \nBureau of Refugees, Freedman and Abandoned Lands, \nHead-Quarters, Asst. Commissioner, State of North Carolina \nRaleigh, N.C., ^[[March 2""]] 186^[[6]] [[/preprinted]] \n \nJim Tew (Colored) having made statements that he and his brother Charley & Joe have been bound by one Richd Holmes of Sampson County to one Lewis Tew. - Know all by these that the said Richd Holmes has no authority for binding or apprenticing minors, and that all indentures that may have been made by him are null and void. the aforesaid Jim Tew and his brother Charley and Joe are free to engage in any contract or to work with such as may wish to hire them. \n \nAll officers of the Bureau, Capts of Police or others are requested to see that these men are protected in their rights. \n \nBy order of Col. Whittlesey, Asst. Commissioner \n \nLieut. and A. A. A. Genl.",https://transcription.si.edu/transcribe/15369/NMAAHC-004567415_00230,1.0


In [None]:
# Step 1 & 2: Group by 'merge_id' and concatenate 'transcription_text'
aggregated_text = aa.groupby('merge_id')['transcription_text'].agg('\n'.join).reset_index()

# Step 3: Drop duplicate 'merge_id' rows, keeping the first occurrence
aa = aa.drop_duplicates(subset='merge_id')

# Step 4: Merge the aggregated text back into the original DataFrame
aa = pd.merge(aa, aggregated_text, on='merge_id', how='left', suffixes=('', '_aggregated'))

# Now, 'transcription_text_aggregated' contains the concatenated text
# You might want to rename or drop the original 'transcription_text' column as needed
aa.drop(columns=['transcription_text'], inplace=True)

In [None]:
aa.to_csv('./input/merged_documents.csv', index=False)

## Task #3: Grab Apprenticeship info from CSV

In [None]:
df = pd.read_csv('./input/merged_documents.csv')

49

In [None]:
system_message = textwrap.dedent("""
  From the following historical document text, please grab out the following items and return with the following format JSON
  {{ "results":
  [{{
    "official": "<Freedmen Bureau / Gov Official / Military Official Name>",
    "mentor": "< Person agreeing to take in apprentice >",
    "apprentice_name": "<Name of apprentice>", # The full name (if known) of the apprentice, use first name if all that is known
    "apprentice_age": <Age of the apprentice>, # int value (years), None if not stated
    "state": "< State contract is made >", # If not stated, use None
    "county": "< County contract is made >" # If not stated, use None
  }},
  {{<ITEM 2>}}, {{Item 3>}}, ...]
  }}
  So for the following text:
  "Roger Jones agrees to take in apprentice John Smith, aged 14, in the state of Mississippi, in the county of Copiah with witness John Doe"

  the output text generated would be:
  {{ "results":
  [{{
    "official": "John Doe",
    "mentor": "Roger Jones",
    "apprentice_name": "John Smith",
    "apprentice_age": 14, 
    "state": "Mississippi",
    "county": "Copiah"
  }}]
  }}

  IMPORTANT: Return only the dictionary mapping results to a list of Python dictionary objects and nothing more. It is possible there are no matches, in which case return results field being an empty list.
                                 
  If there are multiple matches, list the full JSON details for each as an item in the results array
  """
)

In [None]:
chat = ChatOpenAI()
# chat = ChatOllama(model="llama2", format="json")

def get_output(input):
    messages = [SystemMessage(content=system_message), HumanMessage(content=inputs[0]), AIMessage(content=str(outputs[0])), HumanMessage(content=input)]
    output = chat.invoke(messages)
    return output.content

def parse_output(output):
    print(output)
    return json.loads(output)

inputs = df['transcription_text_aggregated'].tolist()
print(inputs[33])

test = get_output(inputs[33])
answer = parse_output(test)

answer["results"]


Indenture 
By Capt P.J. Hawk 
Binding 
Ann Hawk 
to  
Nathaniel Hawk 
Caswell County 
N.C. 
October 27-65
{
  "results": [
    {
      "official": "Capt P.J. Hawk",
      "mentor": "Nathaniel Hawk",
      "apprentice_name": "Ann Hawk",
      "apprentice_age": None,
      "state": "N.C.",
      "county": "Caswell County"
    }
  ]
}


JSONDecodeError: Expecting value: line 7 column 25 (char 154)