# Vision 
The system that I will submit will be a deep neural network that takes raw twitter data, processes it, vectorizes it, and then learns the stance of the user based on it. So, in these tweets a user would either be for, against, or neutral on some topic. The system will specifically look at Australian mining companies.

This project will specifically focus on the vectorization step of the process. I will be comparing term-frequency (by itself) to term-frequency in combination with a word2vec-like system. The term-frequency algorithm will be closer to the "bag of words" approach, with each word weighted by its frequency, creating 'tweet-vectors' that should clump based on the content of the tweets. The approach using individual term vectors (think word2vec) will take the frequency vector, along with individual term embeddings, to create another 'tweet-vector' that is 'informed' by what each term 'means'.

The purpose of this project is to compare a system that only uses term-frequency to a system that uses term-frequency in combination with embeddings generated by tools like word2vec and fasttext. I will be primarily evaluating how well they perform in this specific area of stance analysis. A larger goal is to create a system that can then take in new tweets and classify them as either for, against, or neutral in regards to their stance on certain companies, as accurately as possible, and that is why I am taking on this sub-project.

# Background
The system will use Keras for a densely connected network, because of its simplicity. It will also use a grid search to find a good size for the network both in layer count and in individual layer size. It will also use [word2vec](https://en.wikipedia.org/wiki/Word2vec), specifically the keras implementation, because this is a common tool that can generate word embeddings and has many tutorials on how to use it. In addition, there's a chance I will use [fasttext](https://fasttext.cc/), depending on whether I find much difference between word2vec and fasttext (so far it seems they are fairly similar). Fasttext, created by Facebook, seems like a sort of extension of word2vec. This project will also be heavily based on the work that Roy Adams and I did for our senior project.

My data comes from Twitter and was further manipulated by CSIRO and labeled by both CSIRO and others, included Professor Keith VanderLinden and Roy Adams of Calvin College. The tweets gathered were both tweeted from Australia and concerned certain Australian mining companies, such as Adani.

# Implementation
The system is implemented in Python, and solely in python, unless I change it to use fasttext. It uses keras for the densely connected neural networks and word transformations/vectorizations. The keras neural networks train on vectors that are generated from the data. There are two versions of the program, with the difference being that each uses a different vectorization approach, as detailed above. This keras neural network runs on Borg, the Calvin super computer, specifically on GPU nodes. It runs within a singularity container to make the code easily portable, if desired. The results are then stored in two files, each with a CSV format. One file will have the results of the term-frequency vectorization method, and the other will use the term-frequency combined with word2vec vectorization method. Both run the same grid search on an array of network architectures.

# Results
Currently a system using densely connected layers and term-frequency alone has gotten an accuracy of around 74%. Further results are unknown at this time.

# Implications
If able to increase the accuracy of this system, then I will have helped increase a computers ability to perform stance analysis of Twitter Data. Stance analysis, similar to sentiment analysis, has business implications. If companies are able to understand the stance of a subset of the population that engages with their company, they can better understand whether or not they have a social license to start a new operation, or continue a current operation.

Another area in which it could be used would be the political arena. Politicians could use stance analysis to see where people stand in regards to the politician themselves or certain policies. This would allow politicians to better understand where a subset of their constituents stand on certain matters.

However, stance analysis can also be misused. If someone is able to mass-analyze text and identify what stance certain people took, then they could perhaps use it as a tool of control. For example, in a nation like China which is increasing restrictions on speech every day, a system like this could be used to weed out dissenters in a more effecient manner.