Skip to content

NikhilMathursProjects/Text2SQL

Repository files navigation

Text2SQL

Currently all the data is already created with the files also being created: cloud_costs.db all_profiles.json azure_cost_desc.json aws_cost_desc.json

So we cannot change the data in mock_data_sets dynamically, as it is using what is stored in the above files, will update this to run dynamically for any csv stored in the folder mock_data_sets.

Steps to run sequentially:

db_profiling.py [RUN THIS]

This will create a db file cloud_costs.db , which will contain the related tables that are in the folder 'mock_data_sets' Since 'mock_data_sets' contains 'aws_cost_usage.csv' and 'azure_cost_usage.csv' , it will create 2 tables with the same names.

db_profiling.py [No need to run this]

This is used to create a basic statistical profiling for all the tables available in the database (aws_cost_usage' and 'azure_cost_usage').

llm_profling.py [No need to run this]

This creates a prompt out of the basic statistical data we have and sends it to the google studio API and gets a JSON answer that provides a short and long description for each column in each table. Example: "azure_cost_usage": { "table_name": "azure_cost_usage", "row_count": 5000, "columns": { "billedcost": { "null_count": 0, "distinct_count": 2902, "data_type": "float64", "sample_values": [ 0.059923125, 0.000139329578435, 4.09880552e-07, 3.37077e-06, 2.4544512e-05, 2.7387643576e-05, 0.0233438184, 0.021306, 0.0, 0.306324 ], "min_value": 0.0, "max_value": 992.32695, "short_description": "Represents the actual cost billed for the Azure resource usage.", "long_description": "This column contains the cost in a floating-point number format, reflecting the finalized cost after discounts and adjustments. The value range is from 0.0 to 992.32695, indicating the cost can be zero or a positive value. It's likely in the currency specified by 'billingcurrency'." }, "billingaccountid": { "null_count": 0, "distinct_count": 1, "data_type": "object", "sample_values": [ "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455", "/providers/Microsoft.Billing/billingAccounts/92487455" ], "min_value": null, "max_value": null, "min_length": 53, "max_length": 53, "common_patterns": { "alphanumeric": 5000 }, "short_description": "Unique identifier for the billing account associated with the Azure usage.", "long_description": "This column stores the billing account ID as an object (likely a string). All entries have the same value, indicating all usage is associated with a single billing account. The string is 53 characters long and consists of alphanumeric characters." } } }

api.py [Run this]

The FastAPI code which allows a user to ask a question and get an answer. Run this and use api_call.py to get the responses

api_call.py [Python code to send requests and get responses]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages