@USCDataScience

USC Information Retrieval & Data Science

USC Information Retrieval and Data Science Group

Pinned repositories

  1. sparkler

    Spark-Crawler : Evolving Apache Nutch to run on Spark.

    Java 190 96

  2. SentimentAnalysisParser

    Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.

    24 7

  3. autoextractor

    Forked from thammegowda/autoextractor

    A toolkit for clustering web pages based on various similarity measures.

    Java 14 2

  4. polar.usc.edu

    Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California

    HTML 10 34

  5. parser-indexer-py

    Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer

    Jupyter Notebook 7 2

  • A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video

    3 1 Apache-2.0 Updated Jul 7, 2018
  • USC Information Retrieval and Data Science Group

    HTML 8 21 Apache-2.0 Updated Jul 6, 2018
  • Conceptual - Temporal - Spatial analysis of the trec polar dataset

    JavaScript 8 5 Updated Jul 5, 2018
  • Domain Discovery for the Sparkler Crawl Environment

    HTML 3 Apache-2.0 Updated Jul 2, 2018
  • Using Google Search API we collect URLs relevant to the Polar Domain for deep insights and intelligent crawling

    HTML 3 2 Updated Jun 28, 2018
  • This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models. The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordin…

    Python 3 3 Updated Jun 28, 2018
  • Spark-Crawler : Evolving Apache Nutch to run on Spark.

    Java 190 96 Apache-2.0 Updated Jun 19, 2018
  • Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer

    Jupyter Notebook 7 2 Apache-2.0 Updated Jun 17, 2018
  • LDA Topic Modeling for Polar Data Insights

    Jupyter Notebook 1 1 Updated Jun 8, 2018
  • Collection of projects from IRDS students studying unidentified flying objects

    HTML 5 22 Apache-2.0 Updated Apr 30, 2018
  • Models, and associated helper code for GSOC 2017 project Tensorflow Image to Text in Apache Tika

    Python 5 13 Apache-2.0 Updated Apr 21, 2018
  • Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California

    HTML 10 34 Apache-2.0 Updated Apr 11, 2018
  • Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum

    Java 7 7 Apache-2.0 Updated Feb 26, 2018
  • Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.

    24 7 Apache-2.0 Updated Jan 16, 2018
  • A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

    Python 1 156 Apache-2.0 Updated Jan 9, 2018
  • This is a FacetView setup for ocean observation Crawled Data.

    JavaScript 1 Apache-2.0 Updated Nov 12, 2017
  • Java 6 Updated Nov 1, 2017
  • Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser

    Java 1 3 GPL-3.0 Updated Oct 26, 2017
  • A simple timeline component that labels do not overlap.

    JavaScript 28 Apache-2.0 Updated Oct 17, 2017
  • File Byte Histogram Machine learnig Classification

    R 3 4 Updated Sep 4, 2017
  • Web UI for labelling dataset for supervised learning.

    Python 37 11 Apache-2.0 Updated Sep 1, 2017
  • A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.

    Java 6 2 Apache-2.0 Updated Jul 10, 2017
  • A toolkit for clustering web pages based on various similarity measures.

    Java 14 4 Updated Jul 10, 2017
  • Public and free annotated datasets of relationships between entities/nominals

    1 53 Updated Jun 25, 2017
  • Domain Discovery on Polar Domain

    HTML 1 Apache-2.0 Updated Jun 7, 2017
  • Columbia Image Search tool for MEMEX

    Jupyter Notebook 19 BSD-2-Clause Updated May 26, 2017
  • This is a REST Server endpoint built using Flask and Python.

    Java 19 12 Apache-2.0 Updated May 21, 2017
  • Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika

    Java 2 4 Apache-2.0 Updated May 15, 2017
  • This repository contains deeplearning4j examples for importing and making use of models trained in keras

    Java 23 17 Apache-2.0 Updated May 7, 2017
  • A ruby parser using linkeddata and RDF to fetch the JPL Sweet ontology and load it into Neo4J for cool graph queries and examination.

    Ruby 2 2 Updated Apr 18, 2017