Scrapipy is a powerful AI-powered web scraping dashboard built using Streamlit, Selenium, and LangChain. It allows users to extract, summarize, and analyze web content with a user-friendly interface and LLM integration.
Visit Here- https://scrapipy.streamlit.app/
- 🔍 Input a website URL and extract clean text using Selenium & BeautifulSoup
- 🧠 Analyze and summarize content using LLMs via
langchain_together - 📊 Interactive UI with Streamlit
- 🌐 Environment-secure configuration via
.envor Streamlit Secrets - 💬 Modular design for easy LLM and model integration
Streamlit– For building the web UISelenium+BeautifulSoup– For scraping dynamic and static contentLangChain+langchain_together– For LLM integrationPython-dotenv– For environment variablesOpenAI/Together API– For running language models
git clone https://github.com/akarshmi/Scrapipy.git
cd Scrapipypython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the root folder:
OPENAI_API_KEY=your_openai_key_here
TOGETHER_API_KEY=your_together_api_key_hereAlternatively, add them securely on Streamlit Cloud under Secrets.
streamlit run main.pyThis app can be deployed instantly using Streamlit Cloud:
- Push your code to GitHub
- Go to Streamlit Cloud
- Click “New App”
- Select your repo and
main.pyas the entry point - Add secrets (API keys), then deploy 🎉
Scrapipy/
├── main.py
├── parse.py
├── utils/
│ └── dom_utils.py
├── requirements.txt
└── README.md
Created with 💻 by Akarsh Mishra
Feel free to fork, star ⭐ and contribute!
This project is licensed under the MIT License.