A desktop search‑engine app over Twitter comments, built in Java Swing with Apache Lucene full‑text indexing. Scan large volumes of tweets, perform keyword queries, and explore results in a rich GUI.
Download Latest Release (Windows Executable)
- Overview
- Features
- Screenshots
- Technologies
- Getting Started
- Usage
- Architecture
- Contributing
- License
- Contact
- Acknowledgements
- Changelog
“GoogleFromLidl” is a Java Swing desktop application that lets you index and search Twitter comments using Apache Lucene. It loads raw tweet data, builds an inverted index, and provides instant full‑text queries with ranking. Use it to explore public sentiment, debug NLP pipelines, or prototype search features in pure Java.
- Full‑text indexing of tweet JSON files using Lucene’s
StandardAnalyzer
for tokenization and stemming. - Advanced query syntax: boolean operators, phrase search, wildcard, fuzzy matching.
- Swing‑based GUI: sortable tables, live search box, and result highlighting.
- Configurable indexing: select fields (user, date, text), adjust analyzer settings.
- Export results to CSV for downstream analysis.
- Language: Java 11
- GUI: Java Swing (MVC pattern)
- Search Engine: Apache Lucene 8.x
- Logging: SLF4J + Logback
- Java 11 or higher installed
# Clone the repo
git clone https://github.com/johnprif/GoogleFromLidl.git
cd GoogleFromLidl
- Load Tweets: File -> Open JSON directory (each file contains tweet objects).
- Index: Click "Build Index" to parse and index all tweets.
- Search: Enter keywords or expressions in the search bar then press Enter.
- Inspect: Click any result to view full tweet details and metadata.
- Export: Results -> Export to CSV.
+----------------+ +-----------------+ +------------------+
| Swing GUI | <--> | Controller | <--> | Lucene Index API |
+----------------+ +-----------------+ +------------------+
|
v
+--------------------+
| Tweet JSON Parser |
+--------------------+
- MVC pattern separates UI (Swing GUI), control logic (Controller), and search engine integration (Lucene Index API) into distinct layers for modularity and testability.
- Swing GUI is implemented as the View in MVC, rendering components on a single UI thread and dispatching user events to the Controller.
- Controller mediates between the GUI and the Lucene model: it builds quaries, trigger indexing/search operations, and updates the view with results.
- Lucene Index API (Model) uses
FSDirectory
to persist the inverted index on disk for durability and fast lookup. - Tweet JSON Parser converts raw tweet JSON into Lucene
Document
instances with fields (text, user, date) for indexing. - Analyzer:
StandardAnalyzer
(with optional custom stop-word list) tokenizes, lower-cases, and filters terms during both indexing and querying.
Lucene's FSDirectory.open(Paths.get(indexPath))
chooses the optimal file-system implementation (SimpleFSDirectory, NIOFSDirectory, or MMapDirectory) based on the environment.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/foo
) - Commit your changes (
git commit -m "Add feature"
) - Push (
git push oprigin feature/foo
) - Open a Pull Request
MIT License. See LICENSE
- GitHub: joanisprifti
- Email: joanisprifti@gmail.com
- Phone: +306940020178
- othneildrew/Best-README-Template for structure inspiration.
- FreeCodeCamp article on witing good READMEs
- GitHub Docs on basic Markdown syntax and TOC support.
- Hatica blog on eye-catching README design.
- v1.0 (2023-04-27): Initial release.