Analyze Twitter Feed using N1QL
Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
twitter-feed
.gitignore
LICENSE
app-sam.yml
readme.adoc

readme.adoc

Analyze Twitter Feeds using AWS Lambda and DynamoDB

This sample project uses an AWS Lambda function to retrieve and stores tweets of a tweeter. The tweets are stored in an Amazon DynamoDB instance. The function wakes up every 30 minutes.

The master branch uses DynamoDB. Refer to couchbase branch for the Couchbase setup.

Configure Twitter Credentials

  1. Create a new Twitter app

  2. Copy twitter-feed/src/main/resources/twitter.json.template to twitter-feed/src/main/resources/twitter.json

  3. Specify the required values

Create DynamoDB Table

  1. Create a table Twitter with password.

    1. Primary index using id_str.

  2. Specify the correct region for DynamoDB table in twitter-feed/dynamodb.props

Run Application using CLI

  1. Get home timeline of pre-defined user: mvn package exec:java

  2. Get home timeline a specific user: mvn package exec:java -Dtwitter.user=realDonaldTrump

Deploy Lambda

  1. Create package: mvn package

  2. Upload JAR to S3: mvn install

  3. Deploy Lambda stack:

    aws cloudformation deploy \
    --template-file app-sam.yml \
    --stack-name twitter \
    --region us-west-1
  4. Delete the stack (if a stack needs to be updated):

    aws cloudformation delete-stack \
    --stack-name twitter \
    --region us-west-1

Analyze Tweets using DynamoDB CLI

  1. Number of tweets stored

    Query
    aws dynamodb scan --table-name Twitter --select "COUNT"
    Result
    {
        "Count": 200,
        "ScannedCount": 200,
        "ConsumedCapacity": null
    }
  2. One tweet

    Query
    aws dynamodb scan --table-name Twitter --max-items 1
    Result
    {
        "Count": 200,
        "Items": [
            {
                "contributors": {
                    "NULL": true
                },
                "truncated": {
                    "BOOL": false
                },
                "text": {
                    "S": "...Trump/Russia story was an excuse used by the Democrats as justification for losing the election. Perhaps Trump just ran a great campaign?"
                },
                "is_quote_status": {
                    "BOOL": false
                },
                "in_reply_to_status_id": {
                    "NULL": true
                },
                "id": {
                    "N": "859604996236742656"
                },
                "favorite_count": {
                    "N": "83368"
                },
                "source": {
                    "S": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"
                },
                "retweeted": {
                    "BOOL": false
                },
                "coordinates": {
                    "NULL": true
                },
                "entities": {
                    "M": {
                        "user_mentions": {
                            "L": []
                        },
                        "symbols": {
                            "L": []
                        },
                        "hashtags": {
                            "L": []
                        },
                        "urls": {
                            "L": []
                        }
                    }
                },
                "in_reply_to_screen_name": {
                    "NULL": true
                },
                "id_str": {
                    "S": "859604996236742656"
                },
                "retweet_count": {
                    "N": "18820"
                },
                "in_reply_to_user_id": {
                    "NULL": true
                },
                "favorited": {
                    "BOOL": false
                },
                "user": {
                    "M": {
                        "follow_request_sent": {
                            "BOOL": false
                        },
                        "has_extended_profile": {
                            "BOOL": false
                        },
                        "profile_use_background_image": {
                            "BOOL": true
                        },
                        "id": {
                            "N": "25073877"
                        },
                        "verified": {
                            "BOOL": true
                        },
                        "translator_type": {
                            "S": "regular"
                        },
                        "entities": {
                            "M": {
                                "description": {
                                    "M": {
                                        "urls": {
                                            "L": []
                                        }
                                    }
                                }
                            }
                        },
                        "profile_image_url_https": {
                            "S": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg"
                        },
                        "profile_sidebar_fill_color": {
                            "S": "C5CEC0"
                        },
                        "is_translator": {
                            "BOOL": false
                        },
                        "profile_text_color": {
                            "S": "333333"
                        },
                        "followers_count": {
                            "N": "31947091"
                        },
                        "protected": {
                            "BOOL": false
                        },
                        "location": {
                            "S": "Washington, DC"
                        },
                        "default_profile_image": {
                            "BOOL": false
                        },
                        "id_str": {
                            "S": "25073877"
                        },
                        "is_translation_enabled": {
                            "BOOL": true
                        },
                        "utc_offset": {
                            "N": "-14400"
                        },
                        "statuses_count": {
                            "N": "35020"
                        },
                        "description": {
                            "S": "45th President of the United States of America"
                        },
                        "friends_count": {
                            "N": "45"
                        },
                        "profile_background_image_url_https": {
                            "S": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg"
                        },
                        "profile_link_color": {
                            "S": "1B95E0"
                        },
                        "profile_image_url": {
                            "S": "http://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg"
                        },
                        "notifications": {
                            "BOOL": false
                        },
                        "geo_enabled": {
                            "BOOL": true
                        },
                        "profile_background_color": {
                            "S": "6D5C18"
                        },
                        "profile_banner_url": {
                            "S": "https://pbs.twimg.com/profile_banners/25073877/1496423644"
                        },
                        "profile_background_image_url": {
                            "S": "http://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg"
                        },
                        "screen_name": {
                            "S": "realDonaldTrump"
                        },
                        "lang": {
                            "S": "en"
                        },
                        "profile_background_tile": {
                            "BOOL": true
                        },
                        "favourites_count": {
                            "N": "22"
                        },
                        "name": {
                            "S": "Donald J. Trump"
                        },
                        "url": {
                            "NULL": true
                        },
                        "created_at": {
                            "S": "Wed Mar 18 13:46:38 +0000 2009"
                        },
                        "contributors_enabled": {
                            "BOOL": false
                        },
                        "time_zone": {
                            "S": "Eastern Time (US & Canada)"
                        },
                        "profile_sidebar_border_color": {
                            "S": "BDDCAD"
                        },
                        "default_profile": {
                            "BOOL": false
                        },
                        "following": {
                            "BOOL": false
                        },
                        "listed_count": {
                            "N": "71274"
                        }
                    }
                },
                "geo": {
                    "NULL": true
                },
                "in_reply_to_user_id_str": {
                    "NULL": true
                },
                "lang": {
                    "S": "en"
                },
                "created_at": {
                    "S": "Wed May 03 03:06:16 +0000 2017"
                },
                "in_reply_to_status_id_str": {
                    "NULL": true
                },
                "place": {
                    "NULL": true
                }
            }
        ],
        "NextToken": "eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDF9",
        "ScannedCount": 200,
        "ConsumedCapacity": null
    }

Analyze Tweets using N1QL

  1. Top 5 tweeting days

    Query
    SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10) tweet_date,
           COUNT(1) tweet_count
    FROM   twitter
    GROUP  BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10)
    ORDER  BY COUNT(1) DESC
    LIMIT  5;
    Result
    [
      {
        "tweet_count": 12,
        "tweet_date": "2017-01-06"
      },
      {
        "tweet_count": 11,
        "tweet_date": "2016-12-04"
      },
      {
        "tweet_count": 10,
        "tweet_date": "2017-01-03"
      },
      {
        "tweet_count": 10,
        "tweet_date": "2017-01-04"
      },
      {
        "tweet_count": 9,
        "tweet_date": "2016-12-10"
      }
    ]
  2. How many days tweeted X times

    Query
    SELECT a.tweet_count, count(1) days FROM (
    SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10) tweet_date,
           COUNT(1) tweet_count
    FROM   twitter
    GROUP  BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10)
    ) a
    GROUP BY a.tweet_count
    ORDER BY a.tweet_count DESC;
    Result
    [
      {
        "days": 1,
        "tweet_count": 13
      },
      {
        "days": 1,
        "tweet_count": 12
      },
      {
        "days": 1,
        "tweet_count": 11
      },
      {
        "days": 2,
        "tweet_count": 10
      },
      {
        "days": 1,
        "tweet_count": 9
      },
      {
        "days": 7,
        "tweet_count": 8
      },
      {
        "days": 3,
        "tweet_count": 7
      },
      {
        "days": 7,
        "tweet_count": 6
      },
      {
        "days": 5,
        "tweet_count": 5
      },
      {
        "days": 5,
        "tweet_count": 4
      },
      {
        "days": 11,
        "tweet_count": 3
      },
      {
        "days": 2,
        "tweet_count": 2
      },
      {
        "days": 1,
        "tweet_count": 1
      }
    ]
  3. Most common hour in a day to tweet

    Query
    SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2) tweet_hour,
           COUNT(1) tweet_count
    FROM   twitter
    GROUP  BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2)
    ORDER  BY tweet_count DESC
    LIMIT  5;
    Result
    [
      {
        "tweet_count": 39,
        "tweet_hour": "13"
      },
      {
        "tweet_count": 27,
        "tweet_hour": "12"
      },
      {
        "tweet_count": 26,
        "tweet_hour": "11"
      },
      {
        "tweet_count": 20,
        "tweet_hour": "14"
      },
      {
        "tweet_count": 14,
        "tweet_hour": "00"
      }
    ]
  4. Day of the week to tweet

    Query
    SELECT DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week") day_of_week,
           COUNT(1) tweet_count
    FROM   twitter
    GROUP  BY DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week")
    ORDER  BY tweet_count DESC;
    Result
    [
      {
        "day_of_week": 0,
        "tweet_count": 40
      },
      {
        "day_of_week": 5,
        "tweet_count": 36
      },
      {
        "day_of_week": 2,
        "tweet_count": 36
      },
      {
        "day_of_week": 6,
        "tweet_count": 33
      },
      {
        "day_of_week": 1,
        "tweet_count": 33
      }
    ]

    RFE: Given a date, return the English name for it

  5. Top 5 mentions in tweets

    Query
    SELECT COUNT(1) user_count, ue.screenName
        FROM twitter
        UNNEST userMentionEntities ue
        GROUP by ue.screenName
        ORDER by user_count DESC
        LIMIT 5;
    Result
    [
      {
        "screenName": "realDonaldTrump",
        "user_count": 8
      },
      {
        "screenName": "FoxNews",
        "user_count": 7
      },
      {
        "screenName": "CNN",
        "user_count": 5
      },
      {
        "screenName": "DanScavino",
        "user_count": 5
      },
      {
        "screenName": "mike_pence",
        "user_count": 4
      }
    ]

    TODO: Talk about user_count vs User_count based upon which field should be shown first.

  6. Top 3 tweets with RTs

    Query
    SELECT retweetCount, text
    FROM twitter
    ORDER BY retweetCount
    LIMIT 5;
    Result
    [
      {
        "retweetCount": "10110",
        "text": "the American people. I have no doubt that we will, together, MAKE AMERICA GREAT AGAIN!"
      },
      {
        "retweetCount": "10140",
        "text": "Thank you to all of the men and women who protect & serve our communities 24/7/365! \n#LawEnforcementAppreciationDay… https://t.co/aqUbDipSgv"
      },
      {
        "retweetCount": "10370",
        "text": "We had a great News Conference at Trump Tower today. A couple of FAKE NEWS organizations were there but the people truly get what's going on"
      },
      {
        "retweetCount": "10414",
        "text": "these companies are able to move between all 50 states, with no tax or tariff being charged. Please be forewarned prior to making a very ..."
      },
      {
        "retweetCount": "10416",
        "text": "Somebody hacked the DNC but why did they not have \"hacking defense\" like the RNC has and why have they not responded to the terrible......"
      }
    ]
  7. Original tweets vs RTs

    Query
    SELECT retweet, count(1) count
    FROM twitter
    GROUP BY retweet;
    Result
    [
      {
        "count": 13,
        "retweet": true
      },
      {
        "count": 225,
        "retweet": false
      }
    ]
  8. Most common words

    Query
    SELECT count(1) count, word
    FROM twitter
    UNNEST split(text) word
    GROUP BY word
    ORDER BY count DESC;
    Result
    [
      {
        "count": 168,
        "word": "the"
      },
      {
        "count": 134,
        "word": "to"
      },
      {
        "count": 100,
        "word": "and"
      },
    
      . . .
    
      {
        "count": 1,
        "word": "Pres-Elect"
      },
      {
        "count": 1,
        "word": "dealing"
      },
      {
        "count": 1,
        "word": "asking"
      }
    ]
  9. How many times the following words are mentioned? A, B, C

    Query
    SELECT COUNT(1) count, LOWER(w) word
    FROM twitter
    UNNEST SPLIT(text) w
    WHERE LOWER(w) IN [ "media", "fake", "america"]
    GROUP by LOWER(w)
    ORDER BY count DESC;
    Result
    [
      {
        "count": 12,
        "word": "media"
      },
      {
        "count": 9,
        "word": "fake"
      },
      {
        "count": 8,
        "word": "america"
      }
    ]

    TODO: All reserved words are CAPITAL