Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
USC Information Retrieval and Data Science Group
Conceptual - Temporal - Spatial analysis of the trec polar dataset
Domain Discovery for the Sparkler Crawl Environment
Using Google Search API we collect URLs relevant to the Polar Domain for deep insights and intelligent crawling
This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models. The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordin…
Spark-Crawler : Evolving Apache Nutch to run on Spark.
Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer
LDA Topic Modeling for Polar Data Insights
Collection of projects from IRDS students studying unidentified flying objects
Models, and associated helper code for GSOC 2017 project Tensorflow Image to Text in Apache Tika
Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California
Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
This is a FacetView setup for ocean observation Crawled Data.
Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser
A simple timeline component that labels do not overlap.
File Byte Histogram Machine learnig Classification
Web UI for labelling dataset for supervised learning.
A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
A toolkit for clustering web pages based on various similarity measures.
Public and free annotated datasets of relationships between entities/nominals
Domain Discovery on Polar Domain
Columbia Image Search tool for MEMEX
This is a REST Server endpoint built using Flask and Python.
Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika
This repository contains deeplearning4j examples for importing and making use of models trained in keras
A ruby parser using linkeddata and RDF to fetch the JPL Sweet ontology and load it into Neo4J for cool graph queries and examination.