A Streamlit web application that allows users to upload CSV files and ask natural language questions about their data using Groq's AI models.
- CSV File Upload: Easy drag-and-drop CSV file uploading
- Data Preview: Automatically displays the first few rows of uploaded data
- Natural Language Queries: Ask questions about your data in plain English
- AI-Powered Analysis: Uses Groq's LLaMA 3 70B model for intelligent data analysis
- Interactive Interface: Clean, user-friendly Streamlit interface
- Python 3.7 or higher
- Groq API key (sign up at Groq Console)
-
Clone or download the repository
git clone <your-repo-url> cd csv-data-analyst
-
Install required dependencies
pip install streamlit pandas langchain-groq python-dotenv
-
Set up environment variables
Create a
.envfile in the project root directory:GROQ_API_KEY=your_groq_api_key_hereReplace
your_groq_api_key_herewith your actual Groq API key.
-
Start the application
streamlit run app.py
-
Upload your CSV file
- Click on "Browse files" or drag and drop your CSV file
- The app will automatically load and display a preview of your data
-
Ask questions about your data
- Type your question in the text input field
- Examples of questions you can ask:
- "What is the average value in the price column?"
- "How many rows contain missing values?"
- "What are the top 5 categories by count?"
- "Show me summary statistics for all numerical columns"
- "What trends do you see in the data?"
-
View AI-generated insights
- The AI will analyze your entire dataset and provide detailed answers
- Results are displayed in an easy-to-read format
- CSV files (.csv)
- Files should be properly formatted with headers in the first row
| Variable | Description | Required |
|---|---|---|
GROQ_API_KEY |
Your Groq API key for accessing the AI model | Yes |
streamlit: Web application frameworkpandas: Data manipulation and analysislangchain-groq: Integration with Groq AI modelspython-dotenv: Environment variable management
This application uses the LLaMA 3 70B model from Groq, which provides:
- High-quality natural language understanding
- Efficient processing of structured data
- Detailed analytical responses
- Large CSV files may take longer to process
- The entire dataset is sent to the AI model for analysis
- API rate limits may apply based on your Groq plan
"API key not found" error
- Ensure your
.envfile is in the correct directory - Verify your Groq API key is valid and properly formatted
File upload issues
- Check that your file is a valid CSV format
- Ensure the file size is within reasonable limits
- Verify the CSV has proper headers
Slow responses
- Large datasets may take time to process
- Consider using smaller sample datasets for testing
- Never commit your
.envfile to version control - Keep your API keys secure and don't share them publicly
- The application sends your data to Groq's servers for processing
Once you've uploaded your CSV, try asking these types of questions:
- Statistical Analysis: "What are the mean, median, and mode of the sales column?"
- Data Quality: "Are there any missing values in the dataset?"
- Trends: "What patterns do you notice in the data over time?"
- Comparisons: "Compare the performance between different categories"
- Summaries: "Give me a summary of the key insights from this data"