Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C4GT] Performance, Cost Optimization, Benchmarking #28

Open
12 tasks
ChakshuGautam opened this issue May 15, 2023 · 12 comments
Open
12 tasks

[C4GT] Performance, Cost Optimization, Benchmarking #28

ChakshuGautam opened this issue May 15, 2023 · 12 comments

Comments

@ChakshuGautam
Copy link
Collaborator

ChakshuGautam commented May 15, 2023

Project Details

Text2SQL is an application that allows users to interact with their data using natural language queries. Currently, it only supports SQL-based querying but the implementation is not limited to that. Text2SQL provides APIs to generate the appropriate query (SQL or otherwise) and return the data you need.

Features to be implemented

Token Optimization

Improve token usage with OpenAI

Alternate Models Evaluation

Models to be evaluated

Domian Mapping to Schema

  • Solve for cases when the DB/Tables are not having intuitive names
  • Solve for cases where the data in a dataset is needed to figure out viable filters

Test Cases/Benchmarking

Add public test cases to test out the current model.

Learning Path

Complexity

Complex

Skills Required

Python, Knowledge of HuggingFace Transformers, NLP, SQL, Databases.

Name of Mentors:

@ChakshuGautam

Project size

8 Weeks

Product Set Up

See the setup here

Acceptance Criteria

  • Evaluation Matrix of Model vs Use Case
  • Solve for a single Education domain and test if on a new schema
  • Run test cases and update benchmarks
  • Token usage chart to be shared showing improvements on benchmarks with smaller prompts

C4GT

This issue is nominated for Code for GovTech (C4GT) 2023 edition.
C4GT is India's first annual coding program to create a community that can build and contribute to global Digital Public Goods. If you want to use Open Source GovTech to create impact, then this is the opportunity for you! More about C4GT here: https://codeforgovtech.in/

@ChakshuGautam ChakshuGautam changed the title [C4GT] Performance/Token Optimizations [C4GT] Performance, Cost Optimization, Benchmarking May 15, 2023
@HemanthSai7
Copy link

HemanthSai7 commented May 18, 2023

Hello @ChakshuGautam ,
I am interested in contributing to this project. Could you please clarify the feature Alternate Model Evaluation? Does it mean trying out the models used in WikiSQL etc and reporting the results?

@ChakshuGautam
Copy link
Collaborator Author

ChakshuGautam commented May 18, 2023

Yes @HemanthSai7. But not with their data - it needs to be a complete cycle of training the model for a domain and seeing the results. Can you start with creating test data first? Also we don't need to do for all, just some promising ones. I am looking at 3 max based on literature review - the ones that have been evolved the most.

@HemanthSai7

This comment was marked as off-topic.

@fibonacci35813

This comment was marked as off-topic.

@ChakshuGautam
Copy link
Collaborator Author

Hey guys, let me break this down further and share it by EoD today.

@ManasaKaza

This comment was marked as off-topic.

@dixitdeeksha

This comment was marked as off-topic.

@rishabhv471

This comment was marked as off-topic.

@suyashgautam
Copy link
Collaborator

Hey @rishabhv471 , You can start by setting the project up in a Gitpod environment or in your local. For Gitpod you can follow my video. For local setup you can you can follow the readme. If you face any issues or if you have any question you can ping in the discord channel or you ping me. Will be happy to help. Looking forward to your contribution.

@prajak002

This comment was marked as off-topic.

@AmanGadadare

This comment was marked as off-topic.

@ChakshuGautam

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants