Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[GSoC 2013] Some new Insights by Joe Mathai
1 Uniqueness meter for TWEET/POST
Why this Insight ? An average tweet Savvy person might have greater than 9000 tweets to his credits.The probability of a same idea repeating in this 9000 tweets is very high and if the person is very conscious about what he tweets it would be a great help if a uniqueness meter is provided as a plugin and alert the user a ratio or a percentage match to a previous tweet(s) in the past and display that to. The same idea can be implemented across to posts too.
The idea for implementation is to use the shingling technique(set similarity using Jaccard coefficient) for tweets.This will be able to successfully identify subsets that are common in the multiple tweets and since this is a kind of fuzzy logic implementation the ratio or percentage can be set so that the Insight is generated if it is above a certain percentage. If the posts are taken into consideration for uniqueness it will still give a positive result as this will identify a small portion of repeated idea from a larger group of text. alternative : If the shingling is not feasible then a tweet analysis for common words can be used avoiding commonly used words that will be stored before hand.This will not be as effective as the set similarity.
visualization: For a visual representation the Google charts gauge representation can be used,but it would appear a bit clustered .
2 Additional feature to standout plugin --AMPLIFICATION factor
Why this Insight ?
The standout plugin currently just shows if someone interesting(based on followers)is following you.But it would be an interesting and productive insight if we can calculate a Amplification factor for the same that is the amount of possible people you reach if that (standout) person re-tweets your tweet.Along with it a word cloud could be made to show the kind of followers the standout person has thats not common with your followers based on their bio's.
Amplification factor data also helps you gain an insight into the audience you miss out on when some one stops following you and possible projections can be made.
Implementation: The basic idea is to compare the followers of that person to your followers and then identify the number of unique followers.This again is a probability factor and just predicts the extend you possibly gain from that person added to your followers. Along with an amplification factor one can also form a word cloud from the followers of the standout person (using the most common words in their bio's)this again gives you and insight into the kind of people you are further reaching out to by comparing the word cloud of your followers with his/hers.
Visualization: Possibly a word cloud haven't decided yet on the visualization for the amplification factor. Links: http://en.wikipedia.org/wiki/Tag_cloud
3 Track-me plugin
Why this Insight ?
- This plugin aims at giving the user an insight into his movements on the map and trace it for a week/month/year from the data he generates.The idea is to collect the data from Facebook check ins, Foursquare check-in and also data about the place from the tweets and then plot the movement on a map.
- This allows the user to have an insight about from where he generates what data and the frequency of the visit to a particular Place.
- In addition to that there could also be a feature that shows the most popular tweets generated from that particular place,most frequent time of visit and also a comparison of the movement over this week to that with the last week or month.
Implementation: The Facebook api has the required documentation for the check-in and data can be fetched directly.The same goes for the foursquare as well as g+ they too has the check-in feature and the required documentation. The challenge lies in extracting the location from the twitter.I have gone through their forums and the search api's geocode feature doesn't quiet work the way it is with other api's. Alternative: An alternative to twitter checkin is to extract locations from tweet and mark it in the map.
Example: @joe_mathai hey i am at xyz restaurant with my friends and ....
Code can be written to extract the location xyz resturant and check if it exists(using the places api from google) or is map-able.This is just a possibility.
Visualisation: The google map api would best suit this plugin as it offers a feature that lets you draw lines of differnt specification on the map and this could be used to track movements.
Why this Insight?
This insight aims at giving the user(who shares some opinion) a sense of how a particular tweet of his is being accepted by his followers.The idea is to analyse replies to a particular tweet(Sentiment Analysis) and then depending on them calculate a level of acceptance of the tweet and then represent it on a bar graph that can be placed above the current insights like "conversation starter".
The analysis can be done with the help of sentiment analysis model by using Google's Prediction API. https://developers.google.com/prediction/docs/sentiment_analysis The Google prediction API lets you create a model for the data and associate labels with each.It should not be very difficult to train a model for tweet analysis and after a few test run it can be configured to the need. Depending on the labels returned one can easily calculate the net acceptance.
Visualization: A level indicating bar graph might be suitable for representation.In the above picture the level indicator is the one used for YouTube video's like/dislike.
5.Time vs Re-Tweet frequency
Why this Insight ?
- The idea is to give the user an insight into when his followers are the most active and tap the maximum potential for his tweet.
- Many a time what you tweet doesn't reach the maximum audience as different users have different time for their twitter activity.
- this plugin plots a distribution graph for your most re-tweeted vs the time,so that next time you can get a better idea when to post something that doesn't have a time constraint on it .
Implementation: The implementation should be fairly simple as the time and no of re-tweets can be extracted and the visualization used above is a scatter chart from Google Charts API.(note:In the above graph the time scale is not properly set)
6.Photo's like projection :
Why this Insight ?
- Facebook has always been a more complete experience because of the photos you share or are tagged in.This plugin aims to give a tool for the user to identify how well a photo of his is doing among his friends by giving a projection of the likes it about to receive based on the number of likes its already received(similar to upcoming milestone).
- This also shows an insight into the maximum no of likes you received for a photo you were tagged in with the same people or a subset of current people and also display that photo.
- user_photos permission can grant access to photos of you and the ones you were tagged in.And the other required documentation for the project is given in the link : http://developers.facebook.com/docs/reference/api/photo/ which can be used to return the likes and the people tagged in.
- if an album is uploaded it will display only those photos with a like rate of a particular preset value.
- the projection algorithm will be a simple rate based one where the time of achieving the number of likes will be shown in days or hours.(similar to upcoming milestone insight)
7.Additional Feature to Time Machine :
why this Feature ?
- The aim here is to induce a sense of nostalgia by showing a relevant photo or embedding a video published by the user in the past on the same day(if multiple of these based on most commented or most liked) along with their Post/tweet.
- Again using the facebook graph API.