# **Problem Satement** :)- Instagram Reach Analysis

## **Description:**
* **Instagram is one of the most popular social media applications today. People using Instagram professionally are using it for promoting their business, building a portfolio, blogging, and creating various kinds of content. As Instagram is a popular application used by millions of people with different niches, Instagram keeps changing to make itself better for the content creators and the users. But as this keeps changing, it affects the reach of our posts that affects us in the long run. So if a content creator wants to do well on Instagram in the long run, they have to look at the data of their Instagram reach. That is where the use of Data Science in social media comes in. If you want to learn how to use our Instagram data for the task of Instagram reach analysis, this article is for you. In this article, I will take you through Instagram Reach Analysis using Python, which will help content creators to understand how to adapt to the changes in Instagram in the long run.** 

# 1. Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PassiveAggressiveRegressor
import warnings
warnings.filterwarnings('ignore')

# 2. The DataSets

# 2.1. Datasets Information
* **Impressions:** Number of impressions in a post (Reach)
* **From Home:** Reach from home
* **From Hashtags:** Reach from Hashtags
* **From Explore:** Reach from Explore
* **From Other:** Reach from other sources
* **Saves:** Number of saves
* **Comments:** Number of comments
* **Shares:** Number of shares
* **Likes:** Number of Likes
* **Profile Visits:** Numer of profile visits from the post
* **Follows:** Number of Follows from the post
* **Caption:** Caption of the post
* **Hashtags:** Hashtags used in the post


* **Note:** Here’s the Instagram Data we collected from the account of the founder of Statso.
* [DataSets Link (Click Me)](https://statso.io/instagram-reach-analysis-case-study/)

# 2.2. Reading Datsets

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

In [None]:
#df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/DS_PROJECT/Instagram_Reach_Analysis/Instagram data.csv",encoding='cp1252')
df=pd.read_csv("Instagram_data.csv",encoding='cp1252')
df.head()

# 2.3. Data Exploration

In [None]:
df.info()

* **Here All *Feature* is numeric but Caption and Hashtags is Object**

* **Let's Check how nemeric feature related to each others**

In [None]:
df.describe()

# 3. Handling Null Value

In [None]:
df.isnull().sum()

* **There is no null vlaue**

# 4. Data Visualization

In [None]:
plt.style.use('dark_background')
plt.rcParams.update({'text.color':'white'})

In [None]:
df.columns.unique()

In [None]:
plt.figure(figsize=(9,5))
plt.title("Distribution Of Impression From Home",weight="bold",color='red')
sns.distplot(df['From Home'],color='pink')
plt.show()

* **The impressions I get from the home section on Instagram shows how much my posts reach my followers. Looking at the impressions from home, I can say it’s hard to reach all my followers daily**

* **Now let’s have a look at the distribution of the impressions I received from hashtags:**

In [None]:
plt.figure(figsize=(9,5))
plt.title("Distribution Of Impression From Hashtags",weight="bold",color='red')
sns.distplot(df['From Hashtags'],color='green')
plt.show()

* **Hashtags are tools we use to categorize our posts on Instagram so that we can reach more people based on the kind of content we are creating. Looking at hashtag impressions shows that not all posts can be reached using hashtags, but many new users can be reached from hashtags.**

* **Now let’s have a look at the distribution of impressions I have received from the explore section of Instagram:**

In [None]:
plt.figure(figsize=(9,5))
plt.title("Distribution Of Impression From Explore",weight="bold",color='red')
sns.distplot(df['From Explore'],color='green')
plt.show()

* **The explore section of Instagram is the recommendation system of Instagram. It recommends posts to the users based on their preferences and interests. By looking at the impressions I have received from the explore section, I can say that Instagram does not recommend our posts much to the users. Some posts have received a good reach from the explore section, but it’s still very low compared to the reach I receive from hashtags.**

* **Now let’s have a look at the percentage of impressions I get from various sources on Instagram:**

In [None]:
plt.figure(figsize=(9,5))
home=df['From Home'].sum()
hashtags=df['From Hashtags'].sum()
explore=df['From Explore'].sum()
other=df['From Other'].sum()

labels=['From Home','From Hashtags','From Explore','From Other']
values=[home,hashtags,explore,other]

fig=px.pie(df,values=values,names=labels,
           title='Impression On Instagram Posts From Varous Source')
fig.show()

* **So the above donut plot shows that almost 50 per cent of the reach is from my followers, 38.1 per cent is from hashtags, 9.14 per cent is from the explore section, and 3.01 per cent is from other sources.**

# 5. Analyzing Content

* **Now let’s analyze the content of my Instagram posts. The dataset has two columns, namely caption and hashtags, which will help us understand the kind of content I post on Instagram.**

* **Let’s create a wordcloud of the caption column to look at the most used words in the caption of my Instagram posts:**

In [None]:
text=' '.join(i for i in df.Caption)
stopwords=set(STOPWORDS)
wordcloud=WordCloud(stopwords=stopwords).generate(text)
plt.style.use('classic')
plt.figure(figsize=(12,10))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()

* **Now let’s create a wordcloud of the hashtags column to look at the most used hashtags in my Instagram posts:**

In [None]:
text=' '.join(i for i in df.Hashtags)
stopwords=set(STOPWORDS)
wordcloud=WordCloud(stopwords=stopwords).generate(text)
plt.style.use('classic')
plt.figure(figsize=(12,10))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()

* # 6. Analyzing Relationships

* **Now let’s analyze relationships to find the most important factors of our Instagram reach. It will also help us in understanding how the Instagram algorithm works.**

* **Let’s have a look at the relationship between the number of likes and the number of impressions on my Instagram posts:**

In [None]:
figure = px.scatter(data_frame = df, x="Impressions",
                    y="Likes", size="Likes", trendline="ols", 
                    title = "Relationship Between Likes and Impressions",template="plotly_dark")
figure.show()

* **There is a linear relationship between the number of likes and the reach I got on Instagram**

* **Now let’s see the relationship between the number of comments and the number of impressions on my Instagram posts:**

In [None]:
figure = px.scatter(data_frame = df, x="Impressions",
                    y="Comments", size="Comments", trendline="ols", 
                    title = "Relationship Between Comments and Total Impressions",template="plotly_dark")
figure.show()

* **It looks like the number of comments we get on a post doesn’t affect its reach.**

* **let’s have a look at the relationship between the number of shares and the number of impressions:**

In [None]:
figure = px.scatter(data_frame = df, x="Impressions",
                    y="Shares", size="Shares", trendline="ols", 
                    title = "Relationship Between Shares and Total Impressions",template="plotly_dark")
figure.show()

* **A more number of shares will result in a higher reach, but shares don’t affect the reach of a post as much as likes do.**

* **Now let’s have a look at the relationship between the number of saves and the number of impressions:**

In [None]:
figure = px.scatter(data_frame = df, x="Impressions",
                    y="Saves", size="Saves", trendline="ols", 
                    title = "Relationship Between Post Saves and Total Impressions",template="plotly_dark")
figure.show()

* **There is a linear relationship between the number of times my post is saved and the reach of my Instagram post.**

* **Now let’s have a look at the correlation of all the columns with the Impressions column:**

In [None]:
correlation = df.corr()
print(correlation["Impressions"].sort_values(ascending=False))

* **So we can say that more likes and saves will help you get more reach on Instagram. The higher number of shares will also help you get more reach, but a low number of shares will not affect your reach either.**

# 7. Analyzing Conversion Rate

#### **In Instagram, conversation rate means how many followers you are getting from the number of profile visits from a post. The formula that you can use to calculate conversion rate is (Follows/Profile Visits) * 100. Now let’s have a look at the conversation rate of my Instagram account:**

In [None]:
conversion_rate = (df["Follows"].sum() / df["Profile Visits"].sum()) * 100
print(conversion_rate)

* **So the conversation rate of my Instagram account is 41% which sounds like a very good conversation rate.**

In [None]:
figure = px.scatter(data_frame = df, x="Profile Visits",
                    y="Follows", size="Follows", trendline="ols", 
                    title = "Relationship Between Profile Visits and Followers Gained",template="plotly_dark")
figure.show()

* **The relationship between profile visits and followers gained is also linear.**

# 8. Model

* **Now in this section, I will train a machine learning model to predict the reach of an Instagram post.** 
* **Let’s split the data into training and test sets before training the model:**

In [None]:
x = np.array(df[['Likes', 'Saves', 'Comments', 'Shares', 
                   'Profile Visits', 'Follows']])
y = np.array(df["Impressions"])
x_train, x_test, y_train, y_test = train_test_split(x, y, 
                                                test_size=0.2, 
                                                random_state=42)

* **Now here’s is how we can train a machine learning model to predict the reach of an Instagram post using Python:**

In [None]:
model = PassiveAggressiveRegressor()
model.fit(x_train, y_train)
model.score(x_test, y_test)

# 9. Testing

* **Now let’s predict the reach of an Instagram post by giving inputs to the machine learning model:**

In [None]:
# Features = [['Likes','Saves', 'Comments', 'Shares', 'Profile Visits', 'Follows']]
features = np.array([[282.0, 233.0, 4.0, 9.0, 165.0, 54.0]])
model.predict(features)

# **Reference**
* [Aman Kahrwal (medium.com)](https://amankharwal.medium.com/)
* [Google](https://www.google.com/)

# **THANK YOU**