# About:

Prepare the data to populate the 'TexteLégalExactCode' column from the 'Textes' table.

In French-speaking countries :
Legislation includes Acts (Lois) and Ordinances (Ordonnances) :
- A Loi is a text written and passed by parliament.
- An Ordonnance is a text written by the government in the area of the law which must be ratified by parliament.
- In some countries (mainly the DRC), there may have been Ordonnance-Loi or Décret-Loi. I think it's an old name for Ordonnances.
Regulations include Decrees (Décrets) and Orders (Arrêtés) :
- A Décret is a text issued by the government in the Council of Ministers.
- An Arrêté ministériel is a text issued by one ministers. An Arrêté interministériel is a text issued by several ministers.
- A Circulaire (Circular) ou Note de service (Memo) is a text issued by the administration.

In English-speaking countries : (I'm less familiar with it)
Legislation includes Acts and Ordinances :
- An Act is a text written and passed by parliament.
- In some countries, there may have been Ordinances. It seems that sometimes these are local government texts.
- In some countries, there may have been Laws. I think that's the old name for the Acts.
Regulations include Legal Instruments, Legal Notice and Orders : (The exact names depend on the country)
- A Statutory Instrument, Legislation Instrument, Legal Instrument, Government Note or Legal Notice is a text issued by the government.
- An Order seems to come from a minister.

However, we need above all to adapt to the practices of each country. The easiest way is probably to first try to extract the full name from the text : Ordonnance n°2019-022/P-RM du 27 septembre 2019 portant code minier en République du Mali ; Minerals and Mining Act, 2006 (Act 703 of Ghana), etc. It is in the full name of the text that we find the exact nature of the text.


In [5]:
french_definitions = """
Loi: A law passed by the legislature (parliament) in a civil law system. It refers to a formal statute enacted by the legislative body.
Ordonnance: A type of executive order or regulation, often issued by a government authority, and sometimes used to expedite legislative procedures.
Ordonnance-Loi: A law that is issued by executive decree, usually in extraordinary situations, that has the same force as a law passed by the legislature.
Décret: An executive decree issued by the president or a government authority, implementing laws or regulations. It often provides details or guidance on how a law should be applied.
Décret-Loi: A decree issued by the executive that carries the weight of law, typically used in emergency situations or where the legislative process is bypassed.
Arrêté: A legal order issued by an administrative authority (such as a mayor or governor), generally more localized in scope than a decree.
Arrêté ministériel: An order issued by a minister, specifying how certain laws or regulations should be applied within the minister’s domain.
Arrêté interministériel: An order issued jointly by several ministers to regulate a matter that involves multiple ministries.
Circulaire: A circular issued by a government or administrative authority to guide how certain laws or regulations should be interpreted or enforced, usually without binding legal force.
"""

english_definitions = """
Act: A statute or formal written law passed by a legislative body (such as a parliament or congress). It has the highest level of legal authority.
Law: A system of rules created and enforced by governmental or social institutions. It can refer broadly to legal principles, or specifically to individual statutes.
Ordinance: A law or regulation enacted by a municipal or local government. It is more localized than national laws or Acts of Parliament.
Decree: An order issued by a government authority, often in an emergency or under special circumstances, that has the force of law without needing legislative approval.
Statutory Instrument: A form of delegated or secondary legislation made by an individual or body under powers given by an Act of Parliament. It allows the details of an Act to be filled out by regulations.
Legislation Instrument: A general term for legal documents that have the effect of law, including Acts, Statutory Instruments, and other forms of formal legislation.
Legal Instrument: A formal written document that has legal effect. It can include contracts, wills, statutes, decrees, etc.
Government Notice: A formal publication by a government to inform the public about new laws, regulations, or other official actions.
Legal Notice: A formal notification or announcement that has legal implications, often used to inform individuals or the public about legal processes, obligations, or rights.
Order: A formal directive issued by an authority (such as a court or a government official) that has legal force.
"""

In [2]:
# !pip install llama-index
# !pip install llama-index-readers-database
# !pip install llama-index-embeddings-huggingface
# !pip install llama-index-llms-ollama
# !pip install llama-index-postprocessor-cohere-rerank
# !pip install llama-index-postprocessor-flag-embedding-reranker
# !pip install FlagEmbedding
# !pip install openpyxl
# !pip install psycopg2
# !pip install pandas
# !pip install sqlalchemy

In [1]:
%reload_ext autoreload
%autoreload 2

In [37]:
from dotenv import load_dotenv
import os

load_dotenv()

True

In [38]:
# COUNTRY_NAME = 'MLI Mali'
COUNTRY_NAME = os.environ.get("COUNTRY_NAME")
SPOKEN_LANGUAGE = "French"

table_column_name = "TexteLegalExactCode"

print(COUNTRY_NAME)

BEN Bénin


In [3]:
from general_config import COUNTRY_NAMES_LIST

# validate COUNTRY_NAME
if COUNTRY_NAME in COUNTRY_NAMES_LIST:
    print('country name OK')

country name OK


In [6]:
text_definitions = french_definitions if "french" in SPOKEN_LANGUAGE.lower() or "portuguese" in SPOKEN_LANGUAGE.lower() else english_definitions

## Get data from Postgres

In [7]:
from postgres_connection import get_postgress_data
from sql_files import sql_files
import pandas as pd

In [8]:
df = get_postgress_data(sql_files['get_docs_per_country'].replace("%country_name%", COUNTRY_NAME))

  df = pd.read_sql(query, conn)


In [9]:
df.head()

Unnamed: 0,title,content,country
0,img_Loi n°1990-037 (31.12.1990) Loi de finance...,```html\n<h1>REPUBLIQUE DU BENIN</h1>\n<h2>PRE...,BEN Bénin
1,Loi n°2008-009 (30.12.2009) Loi de finances 20...,www.Droit-Afrique.com Bénin\n\n\n\n\nBénin\n\n...,BEN Bénin
2,img_Circulaire n°2012-020 (19.01.2012) Applica...,"Cotonou, 19 JAN 2012\n\nNOTE CIRCULAIRE\n\nLa ...",BEN Bénin
3,img_Circulaire n°2020-002 (03.01.2020) Applica...,MINISTÈRE\nDE L'ÉCONOMIE\nET DES FINANCES\nRÉP...,BEN Bénin
4,img_Loi n°2017-040 (29.12.2017) Loi de finance...,<p>RÉPUBLIQUE DU BÉNIN<br>\nFraternité-Justice...,BEN Bénin


In [10]:
df_leg_exacts = get_postgress_data(sql_files['get_textes_legaux_exacts'], db='Ferdi')

  df = pd.read_sql(query, conn)


In [11]:
df_leg_exacts.head()

Unnamed: 0,TexteLégalExactCode,TexteLégalStandardCode,TexteLégalExactCodeCourt,TexteLégalExactComplet
0,Lég_Loi_Loi,Lég_Loi,Loi,Loi
1,Lég_Ordonnance_Ordonnance,Lég_Ordonnance,Ordonnance,Ordonnance
2,Lég_Ordonnance_OrdonnanceLoi,Lég_Ordonnance,OrdonnanceLoi,Ordonnance-Loi
3,Rég_Décret_Décret,Rég_Décret,Décret,Décret
4,Rég_Décret_DécretLoi,Rég_Décret,DécretLoi,Décret-Loi


In [12]:
df_leg_exacts['TexteLégalExactComplet'] = df_leg_exacts['TexteLégalExactComplet'].str.lower()

# RAG approach

## Get data from Postgres using llama-index db reader

In [7]:
from postgres_connection import psql_conn_config
from llama_index.readers.database import DatabaseReader
from sql_files import sql_files

In [8]:
db = DatabaseReader(
    scheme="postgresql",  # Database Scheme
    host=psql_conn_config.get("HOSTNAME"),  # Database Host
    port="5432",  # Database Port
    user=psql_conn_config.get("USERNAME"),  # Database User
    password=psql_conn_config.get("PASSWORD"),  # Database Password
    dbname=psql_conn_config.get("DATABASE"),  # Database Name
)

### Load the data as llama_index documents

In [10]:
from llama_index.core import Document

documents = db.load_data(query=sql_files['get_docs_per_country'].replace("%country_name%", COUNTRY_NAME))

In [11]:
documents[0].dict()

{'id_': 'cd44bd33-f63f-4c17-ad88-52b3e85e18fe',
 'embedding': None,
 'metadata': {},
 'excluded_embed_metadata_keys': [],
 'excluded_llm_metadata_keys': [],
 'relationships': {},
 'text': 'title: img_JO 1997 n°013 (15.07.1997) (SGG), content: JOURNAL OFFICIEL\n\nDE LA\n\nREPUBLIQUE DU MALI\n\n<table>\n  <tr>\n    <th colspan="2">TARIFS DES ABONNEMENTS</th>\n    <th>TARIFS DES INSERTIONS</th>\n    <th>OBSERVATIONS</th>\n  </tr>\n  <tr>\n    <td>1 an</td>\n    <td>6 mois</td>\n    <td rowspan="4">La ligne..................................400 F<br><br>Chaque annonce répétée.......moitié prix<br><br>Il n\'est jamais compté moins de 1.000 F pour les annonces.</td>\n    <td rowspan="4">Prix au numéro de l\'année courante................400F<br>Prix au numéro de l\'année précédente.......450F<br><br>Les demandes d\'abonnement et les annonces doivent être adressées au Secrétariat Général du Gouverne-ment-D.J.O.D.I.J.<br><br>Les abonnements prendront effet à compter de la date de paiement de le

## RAG

### Embeddings model

#### Hugging Face embedding

In [12]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding( model_name="dunzhang/stella_en_1.5B_v5", trust_remote_code=True)

  from .autonotebook import tqdm as notebook_tqdm
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


#### Ollama embedding
don't use this

In [142]:
# !pip install llama-index-embeddings-ollama

In [143]:
from llama_index.embeddings.ollama import OllamaEmbedding
embed_model = OllamaEmbedding(
    model_name="llama3.1",
    base_url="http://localhot:11434",
    # ollama_additional_kwargs=,
)

ModuleNotFoundError: No module named 'llama_index.embeddings.ollama'

### Vector DataBase

In [13]:
from llama_index.core.node_parser import TokenTextSplitter

chunk_size = 512

transformations_example = [
    TokenTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=int(chunk_size/10),
        separator=" ",
    ),
    embed_model,
]

In [15]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

# setting up the llm
llm = Ollama(model="llama3.1", request_timeout=120.0) 

import os
from llama_index.core import VectorStoreIndex, load_index_from_storage
from llama_index.core.storage import StorageContext
from llama_index.core import Settings

Settings.embed_model = embed_model # we specify the embedding model to be used

if not os.path.exists("storage_MLI Mali"):
    index = VectorStoreIndex.from_documents(documents, transformations=transformations_example)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage")
else:
    print('loading from local')
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir=f"storage_{COUNTRY_NAME}")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

In [None]:
# from llama_index.core import Settings
# from llama_index.core import VectorStoreIndex

# # ====== Create vector store and upload indexed data ======
# Settings.embed_model = embed_model # we specify the embedding model to be used
# index = VectorStoreIndex.from_documents(documents)

### Query Engine

In [None]:
# ====== Setup a query engine on the index previously created ======
Settings.llm = llm # specifying the llm to be used
query_engine = index.as_query_engine(streaming=True, similarity_top_k=4)

#### Query data

In [None]:
response = query_engine.query("What's the date of the document that has this title 'img_JO 1969 n°302 (01.06.1969) (SGG)' ? please respond in english.")
print(response)

# response2 = Ollama.chat(f"Using this text {response}, give me the date of the document")

# print(response2)

1 June 1969.


#### Query pipeline

In [None]:
from llama_index.core import PromptTemplate
from llama_index.core.query_pipeline import QueryPipeline

from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import get_response_synthesizer

from llama_index.core.response_synthesizers import TreeSummarize

In [None]:
from llama_index.core.query_pipeline import InputComponent
from llama_index.core.query_pipeline.query import QueryComponent

# Define your custom components
class QueryComponent1(QueryComponent):
    def run_component(self, query_str, tablename):
        # Process the query_str and tablename
        return {"processed_query": f"Processed {query_str} for {tablename}"}

class QueryComponent2(QueryComponent):
    def run_component(self, processed_query):
        # Further process the query
        return {"final_output": f"Final output: {processed_query}"}


In [160]:
retriever = index.as_retriever(similarity_top_k=5)
summarizer = TreeSummarize(llm=llm)

In [166]:
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine

response_synthesizer = get_response_synthesizer(llm=llm)
retriever_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)
lang_response = retriever_query_engine.query("Is Mali a French, English, Portuguese or Dutch speaking country? Keep the language only.")

In [167]:
print(str(lang_response))

French.


In [None]:
p = QueryPipeline() #verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

Questions:
Are these documents the right ones to analyze? Some of them do not exist in the excel file.
How shall we annotate the ones that are not part of any category? i.e. /home/andrei/Ferdi/data/MLI Mali/20. Droit douanier/Tarif douanier CEDEAO 2017 amendé 2020 (DGD).pdf


In [161]:
textes_legaux_exacts = [
    'Loi',
    'Act',
    'Law',
    'Ordonnance',
    'Ordonnance-Loi',
    'Ordinance',
    'Décret',
    'Décret-Loi',
    'Decree',
    'Statutory Instrument',
    'Legislation Instrument',
    'Legal Instrument',
    'Government Notice',
    'Legal Notice',
    'Arrêté',
    'Arrêté ministériel',
    'Arrêté interministériel',
    'Order',
    'Circulaire',
]

In [169]:
doc_name = "Tarif douanier CEDEAO 2017 amendé 2020 (DGD)"

def get_law_categ(doc_name, lang_response=str(lang_response), textes_legaux_exacts=textes_legaux_exacts):
    output = p.run(input=f"""
                Consider the content of the document that has this title '{doc_name}', and these definitions: {french_definitions if "french" in lang_response.lower() or "portuguese" in lang_response.lower() else english_definitions} 
                In which of these categories: {', '.join(textes_legaux_exacts)}, is it part of? Keep the category only.
                """)
    # output = p.run(input=f"""
    #                Considering the language the document that has this title '{doc_name}' is written in, 
    #                what type of legal document is it? 
    #                """)
    return str(output)

llm_categ_output = df.title.apply(lambda x: get_law_categ(x))

In [173]:
df.title

0                   img_JO 1997 n°013 (15.07.1997) (SGG)
1                   img_JO 1972 n°371 (01.03.1972) (SGG)
2                   img_JO 1997 n°014 (31.07.1997) (SGG)
3                   img_JO 2011 n°035 (02.09.2011) (SGG)
4                   img_JO 2003 n°001 (10.01.2003) (SGG)
                             ...                        
213                 img_JO 1998 n°013 (15.07.1998) (SGG)
214                 img_JO 1994 n°023 (15.12.1994) (SGG)
215    Ordonnance n°2020-013 (21.12.2020) Loi de fina...
216    Loi n°2018-072 (21.12.2018) Loi de finances 20...
217         Tarif douanier CEDEAO 2017 amendé 2020 (DGD)
Name: title, Length: 218, dtype: object

#### Other Query pipeline trials

In [None]:
# try chaining basic prompts
# prompt_str = "What's the date of the document that has this title '{doc_title}' ? please respond in english."
prompt_str = "Based on the title and the content of this text '{doc_title}', in which of these categories: legislation, reglementation or convention, is it part of? please reply in French."
prompt_tmpl = PromptTemplate(prompt_str)

p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)

In [None]:
# generate question regarding topic
prompt_str1 = "Consider the title and the content of this text '{doc_title}', which is a legal document from Mali."
prompt_tmpl1 = PromptTemplate(prompt_str1)
# use HyDE to hallucinate answer.
prompt_str2 = (
    # "Please consider these definitions:\n"
    # "If it's a decree, then it's legislation."
    # "If it's an ordinance then it's a regulation."
    # "If it's a law then it's a legislation."
    # "If it's an international law it usually is a convention"
    "\n"
    "{response1}\n"
    "in which of these categories: legislation, regulation or convention, is it part of? keep the category only."
)
prompt_tmpl2 = PromptTemplate(prompt_str2)

# llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(retriever_mode=llm, similarity_top_k=5)
p = QueryPipeline(
    chain=[prompt_tmpl1, llm, prompt_tmpl2, llm, retriever], verbose=True
)


In [144]:
output = p.run(input=f"""
               Consider the content of the document that has this title '{doc_name}', and these definitions: {french_definitions} 
               In which of these categories: {', '.join(textes_legaux_exacts)}, is it part of?
               """)

print(str(output))

[1;3;38;2;155;135;227m> Running module 65c2142d-3655-4e00-90d5-438551962fbc with input: 
doc_title: 
               Consider the content of the document that has this title 'Tarif douanier CEDEAO 2017 amendé 2020 (DGD)', and these definitions: 
Loi: A law passed by the legislature (parliament) in a ...

[0m[1;3;38;2;155;135;227m> Running module dff56ffc-71f2-4f80-ad2b-6c941cfaa8d7 with input: 
messages: Consider the title and the content of this text '
               Consider the content of the document that has this title 'Tarif douanier CEDEAO 2017 amendé 2020 (DGD)', and these definitions: 
Loi: A...

[0m[1;3;38;2;155;135;227m> Running module 206d3771-9dae-457a-a72e-1052551c99b9 with input: 
response1: assistant: Based on the content and definitions provided, I would categorize "Tarif douanier CEDEAO 2017 amendé 2020 (DGD)" as follows:

The title suggests that this document is related to tariffs or ...

[0m[1;3;38;2;155;135;227m> Running module 3722a567-c7d1-4b5e-be7b-8677ee0

# Match by regex and title contents

In [None]:
from postgres_connection import get_postgress_data
from sql_files import sql_files
import pandas as pd

In [79]:
print(text_definitions)


Loi: A law passed by the legislature (parliament) in a civil law system. It refers to a formal statute enacted by the legislative body.
Ordonnance: A type of executive order or regulation, often issued by a government authority, and sometimes used to expedite legislative procedures.
Ordonnance-Loi: A law that is issued by executive decree, usually in extraordinary situations, that has the same force as a law passed by the legislature.
Décret: An executive decree issued by the president or a government authority, implementing laws or regulations. It often provides details or guidance on how a law should be applied.
Décret-Loi: A decree issued by the executive that carries the weight of law, typically used in emergency situations or where the legislative process is bypassed.
Arrêté: A legal order issued by an administrative authority (such as a mayor or governor), generally more localized in scope than a decree.
Arrêté ministériel: An order issued by a minister, specifying how certain l

In [20]:
def extract_kw(title):
    result = list()
    categories = ["Loi", "Décret", "Ordonnance", "Circulaire", "Arrêté"]
    for categ in categories:
        if categ in title:
            result.append(categ.lower())
    
    if result:
        return ", ".join(sorted(result))

df['TexteLégalExactCode'] = df.title.apply(lambda x: extract_kw(x))

df['TexteLégalExactCode'] = df['TexteLégalExactCode'].replace("loi, ordonnance", "ordonance-loi")
df['TexteLégalExactCode'] = df['TexteLégalExactCode'].replace("décret, loi", "décret-loi")
df['TexteLégalExactCode'] = df['TexteLégalExactCode'].replace("circulaire, loi", "circulaire")

df['TexteLégalExactCode'].value_counts(dropna=0)


TexteLégalExactCode
loi              73
circulaire       16
ordonnance       12
arrêté           11
ordonance-loi     8
décret            5
None              2
Name: count, dtype: int64

In [21]:
df_i = pd.merge(df, df_leg_exacts, how='left', left_on='TexteLégalExactCode', right_on='TexteLégalExactComplet', suffixes=['_', ''])
# df.drop(df_leg_exacts.columns + ['content'], axis=1, inplace=True)
df['TexteLégalExactCode'] = df_i['TexteLégalExactCode']

In [22]:
df[['title', 'country', 'TexteLégalExactCode']].to_csv(f"output/texteLegaleExactCode_{COUNTRY_NAME}.csv", index=False)

## Load existing categories

In [49]:
COUNTRY_NAME = 'MLI Mali'

large_categs = pd.read_csv(f"output/texteLegaleExactCode_{COUNTRY_NAME}.csv")
large_categs = list(list(large_categs.set_index(['title'])[['TexteLégalExactCode']].to_dict().values())[0].items())
# large_categs

## Update column in table

In [50]:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker, declarative_base
# import logging

from postgres_connection import create_psql_engine

In [51]:

# # Configure logging
# logging.basicConfig(
#     level=logging.INFO,
#     filename='sqlalchemy_bulk_update.log',
#     filemode='a',
#     format='%(asctime)s - %(levelname)s - %(message)s'
# )

engine = create_psql_engine(db="Ferdi")

# Create a configured "Session" class
Session = sessionmaker(bind=engine)

# Create a Session
session = Session()

# Declare a mapping
Base = declarative_base()

class Textes(Base):
    __tablename__ = 'textes'
    
    TexteCode = Column("TexteCode", Integer, primary_key=True)
    PaysCode = Column("PaysCode", String, nullable=False)
    AnneeCodeDebut = Column("AnnéeCodeDébut", Integer)
    AnneeCodeFin = Column("AnnéeCodeFin", Integer)
    TexteLegalExactCode = Column("TexteLégalExactCode", String)
    TexteFiscExactCode = Column("TexteFiscExactCode", String)
    TexteCodeArborescence = Column("TexteCodeArborescence", String)
    TexteCourt = Column("TexteCourt", String)
    TexteComplet = Column("TexteComplet", String)

texts_table = Textes()

def bulk_update_textes(textes_to_update, column_name, texts_table=texts_table, COUNTRY_NAME=COUNTRY_NAME):
    """    
    :param employees_to_update: list of tuples (employee_id, new_annee_code_debut)
    :param column_name: valid column name from the textes table
    """
    for texte_code, new_col_value in textes_to_update:
        try:
            # Query the employee
            text_title = session.query(Textes).filter_by(TexteCode=texte_code, PaysCode=COUNTRY_NAME.split(" ")[0]).one_or_none()
            if text_title:
                # text_title.AnneeCodeDebut = new_code
                setattr(text_title, column_name, f"{new_col_value}")
                session.commit()
                msg_ = f"{texte_code} column {column_name} updated to {new_col_value}."
                print(msg_)
                # logging.info(f"{texte_code} column {column_name} updated to {new_col_value}.")
            else:
                msg_ = f"{texte_code} not found."
                print(msg_)
                # logging.warning(f"{texte_code} not found.")
        except Exception as e:
            session.rollback()
            msg_ = f"Error updating {texte_code}: {e}"
            print(msg_)
            # logging.error(f"Error updating {texte_code}: {e}")


In [52]:
# bulk_update_employees(textes_to_update = large_categs[0:1], )

bulk_update_textes(
    textes_to_update = large_categs, 
    column_name = table_column_name, 
    texts_table=texts_table, 
    COUNTRY_NAME=COUNTRY_NAME)

# Close the session
session.close()

Error updating img_JO 1997 n°013 (15.07.1997) (SGG): (psycopg2.errors.ForeignKeyViolation) insert or update on table "textes" violates foreign key constraint "textes_TexteLégalExactCode_fkey"
DETAIL:  Key (TexteLégalExactCode)=(nan) is not present in table "textes_legaux_exacts".

[SQL: UPDATE textes SET "TexteLégalExactCode"=%(TexteLégalExactCode)s WHERE textes."TexteCode" = %(textes_TexteCode)s]
[parameters: {'TexteLégalExactCode': 'nan', 'textes_TexteCode': 'img_JO 1997 n°013 (15.07.1997) (SGG)'}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
Error updating img_JO 1972 n°371 (01.03.1972) (SGG): (psycopg2.errors.ForeignKeyViolation) insert or update on table "textes" violates foreign key constraint "textes_TexteLégalExactCode_fkey"
DETAIL:  Key (TexteLégalExactCode)=(nan) is not present in table "textes_legaux_exacts".

[SQL: UPDATE textes SET "TexteLégalExactCode"=%(TexteLégalExactCode)s WHERE textes."TexteCode" = %(textes_TexteCode)s]
[parameters: {'TexteLégalExactCo

## Misc

### Other trials

In [13]:
def kw_to_categ(title, lang_response=str(lang_response), textes_legaux_exacts=textes_legaux_exacts):
    kws = extract_kw(title)
    output = p.run(input=f"""
                Consider these keywords {kws}, and these definitions: {french_definitions if "french" in lang_response.lower() or "portuguese" in lang_response.lower() else english_definitions} 
                If the keyword list is None return None, otherwise return one of the defined categories.
                """)
    # output = p.run(input=f"""
    #                Considering the language the document that has this title '{doc_name}' is written in, 
    #                what type of legal document is it? 
    #                """)
    return str(output)

keyword_categ_output = df.title.apply(lambda x: kw_to_categ(x))
    

NameError: name 'lang_response' is not defined

In [195]:
keyword_categ_output[1]

'Given the keywords and definitions provided, I will analyze the context information from multiple sources to determine which category best fits.\n\nBased on the text:\n\n"...Le Gouvernement est autorisé... à prendre, par ordonnances, certaines mesures qui sont normalement du domaine de la loi..."\n\nand other similar phrases throughout the documents, it seems that "ordonnances" are being used as a means to bypass or expedite legislative procedures.\n\nThe keyword list is not explicitly None, but rather a predefined set of keywords. Given this context and focusing on the definitions provided:\n\n* Ordonnance: A type of executive order or regulation\n* Décret-Loi: A decree issued by the executive that carries the weight of law (often used in emergency situations)\n\nThe category "Ordonnance" seems to fit best, as it matches the usage of "ordonnances" in the context information.'

In [190]:
df_texteLegaleExactCode = pd.DataFrame(df.title)
df_texteLegaleExactCode['keyword_categ'] = keyword_categ_output
df_texteLegaleExactCode['llm_categ_output'] = llm_categ_output

In [191]:
df_texteLegaleExactCode

def decide_which_categ(keyword_out, llm_out):
    if not keyword_out:
        return keyword_out
    else:
        

Unnamed: 0,title,keyword_categ,llm_categ_output
0,img_JO 1997 n°013 (15.07.1997) (SGG),,Décret-Loi
1,img_JO 1972 n°371 (01.03.1972) (SGG),,Décret-Loi.
2,img_JO 1997 n°014 (31.07.1997) (SGG),,Décret-Loi.
3,img_JO 2011 n°035 (02.09.2011) (SGG),,Décret-Loi
4,img_JO 2003 n°001 (10.01.2003) (SGG),,Décret-Loi
...,...,...,...
213,img_JO 1998 n°013 (15.07.1998) (SGG),,Décret-Loi
214,img_JO 1994 n°023 (15.12.1994) (SGG),,Ordonnance
215,Ordonnance n°2020-013 (21.12.2020) Loi de fina...,"Loi, Ordonnance",Ordonnance-Loi
216,Loi n°2018-072 (21.12.2018) Loi de finances 20...,Loi,Décret-Loi


In [187]:
df_texteLegaleExactCode.to_excel("texteLegaleExactCode.xlsx")