## Coursera Project
### Clean and Analyze Social Media Usage Data with Python
ShotOnIphone

#### Summary :
1. create a comprehensive report that analyzes the performance of different categories of social media posts.
2. work for a social media marketing company that specializes in promoting brands and products on a popular social media platform
3. responsible for analyzing the performance of different types of posts based on categories, such as health, family, food, etc. to help clients optimize their social media strategy and increase their reach and engagement.
4. The team will use your analysis to  making data-driven recommendations to clients to improve their social media performance. This feature will help the marketing agency deliver tweets on time, within budget, and gain fast results.

#### Objectives :
1. Increase client reach and engagement
2. Gain valuable insights that will help improve social media performance
3. Achieve their social media goals and provide data-driven recommendations

#### Challange :
1. social media analyst responsible for collecting, cleaning, and analyzing data on a client's social media posts. 
2. responsible for communicating the insights and making data-driven recommendations to clients to improve their social media performance.
3. Need to set up the environment, identify the categories for the post (fitness, tech, family, beauty, etc)  process, analyze, and visualize data.

#### Readings :
https://towardsdatascience.com/how-to-access-data-from-the-twitter-api-using-tweepy-python-e2d9e4d54978

### i. ETL
1. Extract : from Twitter API to JSON
2. Transform : from JSON to Pandas Dataframe
3. Load : into analysis

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tweepy

client = tweepy.Client(bearer_token='++++')

In [3]:
from datetime import datetime, timedelta

query = '#mobilephotography -is:retweet lang:en'

start_time = "2024-12-24T11:00:00Z"  # Format ISO8601
end_time = "2024-12-24T23:59:59Z"

tweets = client.search_recent_tweets(
    query=query, 
    tweet_fields=['context_annotations', 'created_at', 'public_metrics'],  
    max_results=10,
    start_time=start_time,
    end_time=end_time
)

annotations_list = []


if tweets.data:
    for tweet in tweets.data:
        print(f"Tweet ID: {tweet.id}")
        print(f"Teks: {tweet.text}")
        print(f"Waktu Posting: {tweet.created_at}")
        print(f"Likes: {tweet.public_metrics['like_count']}")
        print(f"Retweets: {tweet.public_metrics['retweet_count']}")
        print(f"Replies: {tweet.public_metrics['reply_count']}")
        print(f"Quote Tweets: {tweet.public_metrics['quote_count']}")
        print("-" * 50)

        
        if hasattr(tweet, 'context_annotations') and tweet.context_annotations:
            for annotation in tweet.context_annotations:
                annotation_dict = {
                    'tweet_id': tweet.id,
                    'tweet_text': tweet.text,
                    'created_at': tweet.created_at,
                    'domain_id': annotation['domain']['id'],
                    'domain_name': annotation['domain']['name'],
                    'domain_description': annotation['domain'].get('description', 'No description'),
                    'entity_id': annotation['entity']['id'],
                    'entity_name': annotation['entity']['name'],
                    'entity_description': annotation['entity'].get('description', 'No description'),
                    'likes': tweet.public_metrics['like_count'],
                    'retweets': tweet.public_metrics['retweet_count'],
                    'replies': tweet.public_metrics['reply_count'],
                    'quotes': tweet.public_metrics['quote_count'],
                    "bookmark_count": tweet.public_metrics.get("bookmark_count", None),
                    "impression_count": tweet.public_metrics.get("impression_count", None),
                }
                annotations_list.append(annotation_dict)
else:
    print("No tweets found in the specified time range.")


df = pd.DataFrame(annotations_list)

Tweet ID: 1871697967146737913
Teks: By our member Abhik - ... of 1475 images taken over a time span of 7.5 h
Each photo 15" and 1600 iso
Taken with #Poco X5 Pro
Pro mode (leica authentic)
West Bengal, #India - #smartphone_astrophotography #mobilephotography #Stars #startrails https://t.co/5TJ75yDvoQ
Waktu Posting: 2024-12-24 23:22:24+00:00
Likes: 8
Retweets: 0
Replies: 0
Quote Tweets: 0
--------------------------------------------------
Tweet ID: 1871585921617080455
Teks: "Joynagarer Moya", the famous winter #SWEET of #bengal #Kolkata #India #mobilephotography https://t.co/eBjH6KxHTw
Waktu Posting: 2024-12-24 15:57:10+00:00
Likes: 9
Retweets: 3
Replies: 0
Quote Tweets: 0
--------------------------------------------------
Tweet ID: 1871581610199654417
Teks: #foodphotography #portraitphotography #landscapephotography #photos #weddingphotography #photoshop #blackandwhitephotography #photographylovers #canonphotography #filmphotography #photograph #mobilephotography #wildlifephotography 
C

In [4]:
df['datetime'] = df['created_at'].dt.strftime('%Y-%m-%d %H:%M:%S%z')

df['date'] = df['created_at'].dt.date  
df['hour'] = df['created_at'].dt.hour  
df['day_name'] = df['created_at'].dt.day_name()  

In [5]:
df.head()

Unnamed: 0,tweet_id,tweet_text,created_at,domain_id,domain_name,domain_description,entity_id,entity_name,entity_description,likes,retweets,replies,quotes,bookmark_count,impression_count,datetime,date,hour,day_name
0,1871697967146737913,By our member Abhik - ... of 1475 images taken...,2024-12-24 23:22:24+00:00,46,Business Taxonomy,Categories within Brand Verticals that narrow ...,1557696420500541440,"Automotive, Aircraft & Boat Business","Brands, companies, advertisers and every non-p...",8,0,0,0,0,120,2024-12-24 23:22:24+0000,2024-12-24,23,Tuesday
1,1871697967146737913,By our member Abhik - ... of 1475 images taken...,2024-12-24 23:22:24+00:00,47,Brand,Brands and Companies,1284106964167884800,Leica,No description,8,0,0,0,0,120,2024-12-24 23:22:24+0000,2024-12-24,23,Tuesday
2,1871697967146737913,By our member Abhik - ... of 1475 images taken...,2024-12-24 23:22:24+00:00,30,Entities [Entity Service],"Entity Service top level domain, every item th...",854692455005921281,Science,Science,8,0,0,0,0,120,2024-12-24 23:22:24+0000,2024-12-24,23,Tuesday
3,1871697967146737913,By our member Abhik - ... of 1475 images taken...,2024-12-24 23:22:24+00:00,66,Interests and Hobbies Category,"A grouping of interests and hobbies entities, ...",847899255880564736,Space,Space and astronomy,8,0,0,0,0,120,2024-12-24 23:22:24+0000,2024-12-24,23,Tuesday
4,1871697967146737913,By our member Abhik - ... of 1475 images taken...,2024-12-24 23:22:24+00:00,67,Interests and Hobbies,"Interests, opinions, and behaviors of individu...",847869714860605440,Photography,Photography,8,0,0,0,0,120,2024-12-24 23:22:24+0000,2024-12-24,23,Tuesday


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 19 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   tweet_id            131 non-null    int64              
 1   tweet_text          131 non-null    object             
 2   created_at          131 non-null    datetime64[ns, UTC]
 3   domain_id           131 non-null    object             
 4   domain_name         131 non-null    object             
 5   domain_description  131 non-null    object             
 6   entity_id           131 non-null    object             
 7   entity_name         131 non-null    object             
 8   entity_description  131 non-null    object             
 9   likes               131 non-null    int64              
 10  retweets            131 non-null    int64              
 11  replies             131 non-null    int64              
 12  quotes              131 non-null    

In [7]:
df.to_csv('mobilephotography_25des.csv', index=False)

In [8]:
df.tweet_id.value_counts()

tweet_id
1871581610199654417    19
1871549640925208715    19
1871581517300015383    19
1871580556078821763    19
1871580435207352691    19
1871549748769182015    19
1871697967146737913    12
1871561793547665728     4
1871585921617080455     1
Name: count, dtype: int64