# ***Instagram Reach Analysis Using Python:***

## Let us start with the task of analyzing the reach of an instagram account by importing the necessary Python libraries

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

# Importing the dataset :***Instagram***

In [8]:
data=pd.read_csv("Instagram.csv", encoding = 'latin1' )
print(data.head())

FileNotFoundError: ignored

### Let's check for any null values in this dataset

In [9]:
data.isnull().sum()

NameError: ignored

### It doesn't have any null values

In [None]:
#If any null value was found
#data=data.dropna()

### Taking a look at the insights of the columns to understand the data type of all the columns

In [None]:
data.info()

# Analyzing the reach of  the Instagram posts

### Looking at the distribution of impressions it has received from home

In [None]:
plt.figure(figsize=(10,8))
plt.style.use('fivethirtyeight')
plt.title("Distribution of Impressions From Home")
sns.distplot(data['From Home'])
plt.show()

### The impressions from the home section on Instagram shows how much the posts reach one's followers. Looking at the impressions from home, I can say it’s hard to reach all the  followers daily.

### Now let’s have a look at the distribution of the impressions received from hashtags:

In [None]:
plt.figure(figsize=(10,8))
plt.title("Distibution of Impressions From Hashtags")
sns.distplot(data['From Hashtags'])
plt.show()

### Hashtags are tools we use to categorize our posts on Instagram so that we can reach more people based on the kind of content we are creating. Looking at hashtag impressions shows that not all posts can be reached using hashtags, but many new users can be reached from hashtags.

Now we will have a look at the distribution of impressions received from the explore section of Instagram:

In [None]:
plt.figure(figsize=(10,8))
plt.title("Distribution of Impressions From Explore Section")
sns.distplot(data['From Explore'])
plt.show()

### The explore section of Instagram is the recommendation system of Instagram. It recommends posts to the users based on their preferences and interests.

### By looking at the impressions received from the explore section, we can say that Instagram doesn't recommend our posts much to the users. Some posts have received a good reach from the explore section, but it’s still very low compared to the reach receive from hashtags.

### Now we will have a look at the percentage of impressions we got from various sources on Instagram:

In [None]:
home=data["From Home"].sum()
hashtags=data["From Hashtags"].sum()
explore=data["From Explore"].sum()
other=data["From Other"].sum()

labels=['From Home','From Hashtags','From Explore','Other']
values=[home,hashtags,explore,other]

fig=px.pie(data,values=values,names=labels,title='Impressions on Instagram Posts From Various Sources', hole=0.5)
fig.show()

### So the above donut plot shows that 44.1 per cent of the reach is from my followers, 33.6 per cent is from hashtags, 19.12 per cent is from the explore section, and 3.05 per cent is from other sources.

# Analyzing Content

### Now we'll analyze the content of the Instagram posts. The dataset has two columns, namely caption and hashtags, which will help us understand the kind of content posted on Instagram.

### Let’s create a wordcloud of the caption column to look at the most used words in the caption of my Instagram posts:

In [None]:
text=" ".join(i for i in data.Caption)
stopwords=set(STOPWORDS)
wordcloud=WordCloud(stopwords=stopwords,background_color="white").generate(text)
plt.style.use('classic')
plt.figure(figsize=(12,10))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()

### Now let’s create a wordcloud of the hashtags column to look at the most used hashtags in the Instagram posts:

In [None]:
text=" ".join(i for i in data.Hashtags)
stopwords=set(STOPWORDS)
wordcloud=WordCloud(stopwords=stopwords,background_color="white").generate(text)
plt.figure(figsize=(12,10))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()

# Analyzing Relationships

### Analyzing relationships helps us to find the most important factors of our Instagram reach. It will also help us in understanding how the Instagram algorithm works.

### Having a look at the relationship between the number of likes and the number of impressions on the Instagram posts:

In [None]:
figure=px.scatter(data_frame=data,x="Impressions",y="Likes",size="Likes",trendline="ols",title="Relationship Between Likes and Impressions")
figure.show()

### There is a linear relationship between the number of likes and the reach we got on Instagram.

### Now let’s see the relationship between the number of comments and the number of impressions on the Instagram posts:

In [None]:
figure=px.scatter(data_frame=data,x="Impressions",y="Comments",size="Comments",trendline="ols",title="Relationship Between Comments and TotalImpressions")
figure.show()

### It looks like the number of comments we get on a post doesn’t affect its reach

### Now let’s have a look at the relationship between the number of shares and the number of impressions:

In [None]:
figure=px.scatter(data_frame=data, x="Impressions",y="Shares",size="Shares",trendline="ols",title="Relationship Between Shares and Total Impressions")
figure.show()

### A more number of shares will result in a higher reach, but shares don’t affect the reach of a post as much as likes do.

### Now let’s have a look at the relationship between the number of saves and the number of impressions:

In [None]:
figure=px.scatter(data_frame=data, x="Impressions",y="Saves",size="Saves",trendline="ols",title="Relationship Between Saves and Total Impressions")
figure.show()

### There is a linear relationship between the number of times our post is saved and the reach of our Instagram post

### Now let’s have a look at the correlation of all the columns with the Impressions column:

In [None]:
correlation=data.corr()
print(correlation["Impressions"].sort_values(ascending=False))

### So we can say that more likes and saves will help you get more reach on Instagram. The higher number of shares will also help you get more reach, but a low number of shares will not affect your reach either.

# Analyzing Conversion Rate

### In Instagram, conversation rate means how many followers you are getting from the number of profile visits from a post. The formula that you can use to calculate conversion rate is (Follows/Profile Visits) * 100.

### Now let’s have a look at the conversation rate of our Instagram account:

In [None]:
conversion_rate=(data["Follows"].sum()/data["Profile Visits"].sum())*100
print(conversion_rate)

### So the conversation rate of our Instagram account is 41% which sounds like a very good conversation rate.

### Let’s have a look at the relationship between the total profile visits and the number of followers gained from all profile visits:

In [None]:
figure=px.scatter(data_frame=data,x="Profile Visits",y="Follows",size="Follows",trendline="ols",title="Relationship Between Profile Visits and Followers Gained")
figure.show()

### The relationship between profile visits and followers gained is also linear.