## **Gen Ai Tax optimization Assistant**


**Problem Statement :** Develop an intelligent tax optimization assistant using Gen Ai tecnnnlogies to provide personalized tax-saving recommendations and strategies based on user financial data and up-to-date tax regulations.
<br>

The assistant should leverage Large Language Models (LLMs) for understanding and interpreting complex tax laws and Retrieval-Augmented Generation (RAG) to generate actionable recommendations. The system will integrate with vector databases for efficient information retrieval and use Langchain to orchestrate interactions between components.

## Project Methodology -

1. Gather user financial data including income, expenses, deductions, and previous tax returns. Ensure this data is structured and anonymized.
2. Clean and normalize user financial data to ensure consistency. Extract key features relevant to tax optimization, and Store in Vector DB (Chroma DB).
3. Integrate RAG to combine the LLM’s capabilities with a retrieval system to pull relevant information from the vector database.
4. The RAG framework will help in tailoring these recommendations by retrieving specific tax-saving tips and regulations relevant to each user’s financial situation.
5. Monitor and measure the performance of the system, including response time, accuracy of recommendations, and user satisfaction.

In [None]:
!pip install faker


In [12]:
import numpy as np
import pandas as pd
import random
num_user =  30
from faker import Faker
fake = Faker()

# Synthentic data generation


def gererate_financial_data(num_user):
    data = {
        'User_ID' : [ i for i in range(1,num_user+1)],
        'Income' : np.random.uniform(30000,150000,num_user).round(2),
        'Expenses' : np.random.uniform(5000,50000,num_user).round(2),

        'HealthInsurance' : np.random.uniform(0,5000,num_user).round(2),
        'HomeLoan' : np.random.uniform(0,10000,num_user).round(2),
        'ELSS': np.random.uniform(0, 5000, num_user).round(2),
        'NPS': np.random.uniform(0, 5000, num_user).round(2),
        'PPF': np.random.uniform(0, 5000, num_user).round(2),
        'HouseRent': np.random.uniform(0, 12000, num_user).round(2),
        'Previous_Tax_Amount': np.random.uniform(2000, 20000, num_user).round(2),
        'State': [fake.state_abbr() for _ in range(num_user)],
        'Filing_Status': [random.choice(['Single', 'Married', 'Head of Household']) for _ in range(num_user)],
        'Tax_Credits': np.random.uniform(0, 5000, num_user).round(2)
    }
    df =  pd.DataFrame(data)
    return df


In [13]:
financial_data = gererate_financial_data(num_user)
financial_data.head()


Unnamed: 0,User_ID,Income,Expenses,HealthInsurance,HomeLoan,ELSS,NPS,PPF,HouseRent,Previous_Tax_Amount,State,Filing_Status,Tax_Credits
0,1,50504.16,42975.37,251.25,9315.5,625.91,855.78,3653.11,3362.93,7014.26,HI,Single,3346.26
1,2,63380.26,19238.32,4157.7,4571.79,4689.07,2861.34,4985.18,5215.45,17945.35,ID,Head of Household,1446.44
2,3,138240.51,32870.23,2503.33,8119.89,488.98,1112.97,3842.32,3525.97,13671.1,SD,Married,3005.49
3,4,130442.52,35557.2,874.74,5276.58,2135.76,3814.44,1317.92,7608.43,17613.13,PA,Single,3597.46
4,5,79711.59,10591.34,1157.17,1148.78,496.49,2269.45,2753.88,5206.35,17059.62,WV,Married,4959.57


In [14]:
import pandas as pd

def generate_tax_regulations():
    tax_brackets = ['10% - $0 to $10,000', '12% - $10,001 to $40,000', '22% - $40,001 to $85,000',
                    '24% - $85,001 to $160,000', '32% - $160,001 to $200,000', '35% - $200,001 and above']
    standard_deductions = [12000] * len(tax_brackets)
    tax_credits = [500, 1000, 1500, 2500, 3000, 4500]

    regulations = {
        'Tax_Bracket': tax_brackets,
        'Standard_Deductions': standard_deductions,
        'Tax_Credits': tax_credits
    }
    df = pd.DataFrame(regulations)
    return df

tax_regulations = generate_tax_regulations()
tax_regulations

Unnamed: 0,Tax_Bracket,Standard_Deductions,Tax_Credits
0,"10% - $0 to $10,000",12000,500
1,"12% - $10,001 to $40,000",12000,1000
2,"22% - $40,001 to $85,000",12000,1500
3,"24% - $85,001 to $160,000",12000,2500
4,"32% - $160,001 to $200,000",12000,3000
5,"35% - $200,001 and above",12000,4500


In [23]:
def apply_tax_regulations(financial_df, regulations_df):
    # Simplified model for applying tax brackets and deductions
       def calculate_tax(user_income, deductions, standard_deductions):
        # Determine tax rate based on income
          if user_income <= 10000:
            tax_rate = 0.10
          elif user_income <= 40000:
            tax_rate = 0.12
          elif user_income <= 85000:
            tax_rate = 0.22
          elif user_income <= 160000:
            tax_rate = 0.24
          elif user_income <= 200000:
            tax_rate = 0.32
          else:
             tax_rate = 0.35

        # Assuming standard seduction applies regardless o filling status
          standard_deduction =  standard_deductions
          taxable_income = max( user_income - deductions - standard_deduction, 0)
          tax_amount = taxable_income * tax_rate
          return tax_amount


    # Calculate estimated tax for each user
       standard_deductions = regulations_df['Standard_Deductions'].iloc[0]
       financial_df['Estimated_Tax'] = financial_df.apply(
        lambda row: calculate_tax(row['Income'], row[['HealthInsurance', 'HomeLoan', 'ELSS', 'NPS', 'PPF', 'HouseRent']].sum(), standard_deductions),
        axis=1
        )
       return financial_df



In [24]:
num_users = 1000
financial_data = gererate_financial_data(num_users)

# Apply tax regulations to the financial data
financial_data_with_taxes = apply_tax_regulations(financial_data, tax_regulations)
financial_data_with_taxes.head()

Unnamed: 0,User_ID,Income,Expenses,HealthInsurance,HomeLoan,ELSS,NPS,PPF,HouseRent,Previous_Tax_Amount,State,Filing_Status,Tax_Credits,Estimated_Tax
0,1,67608.1,36234.88,4219.2,6565.79,463.8,947.06,1902.42,1686.47,2934.39,WI,Single,3173.43,8761.1392
1,2,90542.35,34696.51,2877.31,6533.5,1441.23,4182.14,89.15,3258.87,17405.18,LA,Head of Household,1628.47,14438.436
2,3,43624.98,21841.17,4557.74,569.74,3338.56,3570.39,277.33,10550.91,12084.52,MI,Married,3448.52,1927.2682
3,4,95943.15,9694.51,830.39,4306.05,4221.37,1739.74,4325.62,873.55,19060.82,UT,Head of Household,7.98,16235.1432
4,5,98321.72,37798.35,2102.78,994.29,150.64,4862.61,3184.79,8188.14,8637.5,NM,Head of Household,3714.76,16041.2328


## **Document  Preparation For chromaDB**


In [None]:
!pip install langchain langchain_community

!pip install chromadb


In [26]:
from langchain.docstore.document import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma





In [27]:
# Prepare documents for LangChain
documents = []
for _, row in financial_data_with_taxes.iterrows():
    content = (f"User_ID: {row['User_ID']}, Income: {row['Income']}, Expenses: {row['Expenses']}, "
               f"HealthInsurance: {row['HealthInsurance']}, HomeLoan: {row['HomeLoan']}, "
               f"ELSS: {row['ELSS']}, NPS: {row['NPS']}, PPF: {row['PPF']}, HouseRent: {row['HouseRent']}, "
               f"Previous_Tax_Amount: {row['Previous_Tax_Amount']}, State: {row['State']}, "
               f"Filing_Status: {row['Filing_Status']}, Tax_Credits: {row['Tax_Credits']}, "
               f"Estimated_Tax: {row['Estimated_Tax']}")

    documents.append(Document(page_content=content))

In [28]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Traceback (most recent call last):
  File "/usr/local/bin/huggingface-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/huggingface_cli.py", line 52, in main
    servic

In [None]:
!pip install sentence_transformers

In [37]:
hg_embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
persist_directory = 'vectordb'

langchain_chroma = Chroma.from_documents(

        documents =  documents  ,
        embedding = hg_embeddings  ,
        collection_name = 'financial_data' ,
        persist_directory=  persist_directory

)

## **Model Setup Using Langchain**

In [87]:
model_id = 'HuggingFaceH4/zephyr-7b-beta'
from langchain.llms import HuggingFaceHub

In [108]:
from google.colab import userdata
hg_api =  userdata.get('hugginface_key')

In [109]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = hg_api

In [111]:
model = HuggingFaceHub(repo_id="HuggingFaceH4/zephyr-7b-beta",
                          model_kwargs={"temperature":0.1, "max_new_tokens":512},
                          huggingfacehub_api_token=hg_api)


In [112]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

In [113]:
template = """
Based on the following financial data and tax regulations, analyze and provide personalized tax-saving recommendations:
Financial Data: {question}
Context: {context}
Answer:
"""
PROMPT = PromptTemplate(template=template, input_variables=["question", "context"])

In [114]:
retriever = langchain_chroma.as_retriever(search_kwargs = {"k" : 5})

In [115]:
qa_chain = RetrievalQA.from_chain_type(
    llm=model, retriever=retriever, chain_type_kwargs={"prompt": PROMPT}
)

def get_tax_optimization_recommendations(query):
    # Retrieve documents
    raw_docs = retriever.get_relevant_documents(query)

    # Remove duplicates
    unique_docs = remove_duplicates(raw_docs)

    # Prepare the context for the prompt
    context = " ".join([doc.page_content for doc in unique_docs])

    # Use the QA chain to get the response
    result = qa_chain({"context": context, "query": query})
    return result

In [116]:
!huggingface-cli logout

Successfully logged out.


In [117]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in yo

In [None]:
response = get_tax_optimization_recommendations(query)
print(response['result'])

Based on the following financial data and tax regulations, analyze and provide personalized tax-saving recommendations:
Financial Data: Analyze - User_ID: 317, Income: 65185.29, Expenses: 6770.46, HealthInsurance: 1921.03, HomeLoan: 0.0, ELSS: 0.0, NPS: 1767.37, PPF: 1927.76, HouseRent: 3657.13, Previous_Tax_Amount: 15957.37, State: VI, Filing_Status: Head of Household, Tax_Credits: 2990.91, Estimated_Tax: 9660.
Context: User_ID: 458, Income: 89959.56, Expenses: 25375.02, HealthInsurance: 1959.43, HomeLoan: 1444.54, ELSS: 2430.62, NPS: 1665.6, PPF: 515.7, HouseRent: 9758.05, Previous_Tax_Amount: 13904.35, State: AS, Filing_Status: Head of Household, Tax_Credits: 2350.82, Estimated_Tax: 14444.548799999999

User_ID: 615, Income: 95093.8, Expenses: 23123.7, HealthInsurance: 31.13, HomeLoan: 4639.11, ELSS: 2770.15, NPS: 2967.47, PPF: 1924.8, HouseRent: 10601.14, Previous_Tax_Amount: 18554.82, State: NE, Filing_Status: Head of Household, Tax_Credits: 1939.13, Estimated_Tax: 14438.4

User_ID: 234, Income: 134140.28, Expenses: 29875.57, HealthInsurance: 3174.4, HomeLoan: 903.55, ELSS: 2590.33, NPS: 29.88, PPF: 1958.0, HouseRent: 3840.94, Previous_Tax_Amount: 3851.51, State: IN, Filing_Status: Head of Household, Tax_Credits: 1833.99, Estimated_Tax: 26314.363199999996

User_ID: 663, Income: 95942.32, Expenses: 45816.47, HealthInsurance: 2491.53, HomeLoan: 9404.96, ELSS: 3786.02, NPS: 3639.12, PPF: 4915.87, HouseRent: 7290.31, Previous_Tax_Amount: 18356.14, State: MH, Filing_Status: Head of Household, Tax_Credits: 1957.02, Estimated_Tax: 12579.482400000003

User_ID: 203, Income: 116027.95, Expenses: 17824.29, HealthInsurance: 1997.13, HomeLoan: 320.88, ELSS: 2050.06, NPS: 3283.7, PPF: 1054.12, HouseRent: 9159.14, Previous_Tax_Amount: 19398.68, State: SC, Filing_Status: Single, Tax_Credits: 1352.53, Estimated_Tax: 20679.1008
Answer:

Based on the provided financial data and tax regulations, here are some personalized tax-saving recommendations for each user:

1. User_ID: 317
   - Maximize ELSS investments: The user has not invested in ELSS (Equity-Linked Savings Scheme) yet. ELSS is a tax-efficient investment option that offers tax deductions under Section 80C. The user can invest up to Rs. 1.5 lakh in ELSS to reduce their taxable income.
   - Claim tax credits: The user has claimed tax credits worth Rs. 2,990.91. However, they may be eligible for additional tax credits under various schemes such as the Rajiv Gandhi Equity Savings Scheme (RGESS) or the Pradhan Mantri Vaya Vandana Yojana (PMVVY). The user should check their eligibility and claim these credits to further reduce their tax liability.

2. User_ID: 458
   - Maximize ELSS and NPS investments: The user has invested in ELSS and NPS (National Pension System), but they can still increase their investments to maximize tax benefits. The user can invest up to Rs. 1.5 lakh in ELSS and up to Rs. 50,000 in NPS (under Section 80CCD) to reduce their taxable income.
   - Claim tax credits: The user has claimed tax credits worth Rs. 2,350.82. However, they may be eligible for additional tax credits under various schemes such as the Rajiv Gandhi Equity Savings Scheme (RGESS) or the Pradhan Mantri Vaya Vandana Yojana (PMVVY). The user should check their eligibility and claim these credits to further reduce their tax liability.

3. User_ID: 615
   - Maximize ELSS and NPS investments: The user has invested in ELSS and NPS, but they can still increase their investments to maximize tax benefits. The user can invest up to Rs. 1.5 lakh in ELSS and up to Rs. 50,000 in NPS (under Section 80CCD) to reduce their taxable income.
  