CDS: NLP Research Team
Team Lead: Kenta Takatsu (CS '19)
Advisor: Prof. Thorsten Joachims
We are a student-led research team from Cornell Data Science (CDS), working on Natural Language Processing projects under Prof. Thorsten Joachims. This semester, we are participating in the Yelp Dataset Challenge to provide analytic insights from raw review texts. Our final products are research papers which makes use of machine learning algorithms and statistical validations. You can visit the subteam sections to see our individual work.
This past semester, we had a wide range of research topics, from recommendation system to deep style transfer. In general, we took the approach called Natural Language Processing -- an interaction between machine learning and text analysis.
All researches demonstrated remarkable results; an implementation of recommendation system that beats industry standard algorithm, an accurate analytic tool to assess business trends, a classifier to identify locally popular users, and a writing style transfer with deep learning.
Members: Xuwen Shen (STAT '18), Xinzhe Yang (CS '20)
In order to give insights to overall ratings and then create a new personalized recommendation system based on the rating that account for his or her preferences, we were hoping to extract hidden information in reviews including an individual user’s preference and a business’s properties (scores for each feature of the business). Finally, we created a model combining the topics and overall ratings to get a personalized ratings for a specific user.
Members: Kenta Takatsu (CS '19), Caroline Chang (CS '20)
We are developing a stream-lined star-prediction system to better assess business performance using different types of classifiers, which accounts for the temporal trends in user review topics and the strength/weakness of business characteristics in latent space.
Members: Brandon Kates (BTRY '19), Brian Cheang (CS '20)
The objective of the project is to build and combine two models (Local Expert Identifier / Topical Expert Identifier) for the purpose of identifying 'experts' among yelp users.
Members: Luca Leeser (INFO '18), Yuji Akimoto (ORIE '19), Ryan Butler (CS '19), Cameron Ibrahim (ORIE '20)
We are seeking to modify the neural style transfer algorithm proposed by Gatys et. al. to make it applicable to text. Our goal is to devise an algorithm that is able to transfer the writing style of one review onto the content of another.
You can visit our final papers from the following links:
- Extracting Rating Dimensions from Hidden Topics in Text Reviews
- Topic Modeling as a Trend-Aware Performance Metric
- Identifying Experts in the Yelp Dataset
- On the Use of K-Competitive Networks for Writing Style Transfer
How to get the code
The code uses git submodules, so to properly intialize those you need the
--recurse-submodules option. Additionally, using
--depth 1 will avoid cloning the history, making the clone faster.
git clone --recurse-submodules --depth 1 https://github.com/CornellDataScience/Yelp-FA17.git