## **Play Store Apps Analysis**
Imagine you're diving into a big dataset from the Google Play Store. You'll be using Python to explore it and find interesting insights.

First, the data will be cleaned up using Python to fix any mistakes and make it look neat.

Once the data is clean, SQL will be used to find hidden patterns and correlations in the data. However, in this project, we are only going to work on cleaning tasks.

Visualizations will also be created to make it easier to understand the data. Charts and graphs will show patterns and trends.

By the end, you'll have learned a lot about how to make smart decisions using data. This project helps everyone understand how to use data to make better choices in our digital world.

Join this adventure as you explore the Google Play Store dataset. Get ready to find out all sorts of interesting things about your favorite apps!

### **Install the required libraries**
**langchain**: An open source framework for building applications based on large language models (LLMs).

**langchain_experimental**: An experimental extension of the langchain library, offering cutting-edge features and functionalities for advanced natural language processing tasks.

**openai**: A powerful Python library developed by OpenAI, offering state-of-the-art natural language processing capabilities. It provides tools and utilities for a wide range of NLP tasks, including text generation, text classification, language translation, and more.

In [1]:
!pip install langchain
!pip install langchain_experimental
!pip install openai

Collecting langchain
  Downloading langchain-0.1.11-py3-none-any.whl (807 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/807.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/807.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━[0m [32m696.3/807.5 kB[0m [31m10.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m807.5/807.5 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.25 (from langchain)
  Downloading langchain_community-0.0.27-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Task 1: Data Preparation: Importing Libraries, Setting up API Key, and Reading Data
Before we start our analysis, let's ensure we have everything we need. First, we'll import the required libraries. Then, we'll set up our GPT-3.5 API key for access. Finally, we'll read the data from our CSV file and get everything ready.




**To obtain an API key for OpenAI, follow these steps:**

* Log in to your OpenAI account at OpenAI Login Page(https://platform.openai.com/login?launch).

* Generate and retrieve your API key by visiting API Key Generation Page.(https://platform.openai.com/api-keys)



Download the Playstore Datasets.(https://drive.google.com/drive/folders/1l_TabQZpTRypVKwVcHoD98YSAyXZOu91?usp=sharing)



In [2]:
import os
import pandas as pd
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.llms import OpenAI

#Setup the api key
os.environ['OPENAI_API_KEY']=""

#Load the dataset (playstore_apps.csv) and assign it to a variable 'df'.
df = pd.read_csv('playstore_apps.csv')


### Task 2: Display Top 5 Records.

Great! Initializing our agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our data table

In [3]:
# Initializing the agent
agent = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df, verbose=True)

# write your prompt using the agent
agent("""Display first 5 records from df""")

  warn_deprecated(
  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the head() function to display the first 5 records
Action: python_repl_ast
Action Input: print(df.head())[0m[36;1m[1;3m                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

    Reviews  Size    Installs  Type  Price Content Rating  \
0     159.0   19M     10000.0  Free    0.0       Everyone   
1     967.0   14M    500000.0  Free    0.0       Everyone   
2   87510.0  8.7M   5000000.0  Free    0.0       Everyone   
3  215644.0   25M  50000000.0  Free    0.0           Teen   
4     96

{'input': 'Display first 5 records from df',
 'output': 'The first 5 records from df have been displayed using the head() function.'}

### Task 3: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore app dataset. This will help us understand how many rows and columns it contains.

In [4]:
# write your prompt using the agent
agent("shape of df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the shape of the dataframe, which will tell me the number of rows and columns.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10841, 13)[0m[32;1m[1;3m10841 rows and 13 columns
Final Answer: The shape of the dataframe is (10841, 13).[0m

[1m> Finished chain.[0m


{'input': 'shape of df',
 'output': 'The shape of the dataframe is (10841, 13).'}

### Task 4: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the data table

In [5]:
# write your prompt using the agent
agent("number of null values in df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use a pandas function to count the null values in the dataframe
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                  0
Category             0
Rating            1474
Reviews              1
Size                 0
Installs             1
Type                 1
Price                1
Content Rating       1
Genres               0
Last Updated         1
Current Ver          8
Android Ver          3
dtype: int64[0m[32;1m[1;3m I now know the final answer
Final Answer: There are 1491 null values in the dataframe.[0m

[1m> Finished chain.[0m


{'input': 'number of null values in df',
 'output': 'There are 1491 null values in the dataframe.'}

### Task 5: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore app dataset.

In [6]:
# write your prompt using the agent
agent("number of duplicate rows for each column in df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find a way to count the number of duplicate rows for each column in the dataframe
Action: python_repl_ast
Action Input: df.duplicated().sum()[0m[36;1m[1;3m483[0m[32;1m[1;3m483 is the total number of duplicate rows in the dataframe
Final Answer: The number of duplicate rows for each column in df is 483.[0m

[1m> Finished chain.[0m


{'input': 'number of duplicate rows for each column in df',
 'output': 'The number of duplicate rows for each column in df is 483.'}

### Task 6: Dropping Unnecessary Columns.
Fantastic! We successfully found the duplicates. Now, let's create a prompt to ask our agent to remove the Last Updated, Current Ver, Android Ver and Size columns from the playstore dataset.

In [7]:
# write your prompt using the agent
agent("remove the column 'Last Updated', 'Current Ver', 'Android Ver' and 'Size' in df. Update df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop function to remove the specified columns
Action: python_repl_ast
Action Input: df.drop(['Last Updated', 'Current Ver', 'Android Ver', 'Size'], axis=1, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI now know the final answer
Final Answer: The updated dataframe with the specified columns removed.[0m

[1m> Finished chain.[0m


{'input': "remove the column 'Last Updated', 'Current Ver', 'Android Ver' and 'Size' in df. Update df",
 'output': 'The updated dataframe with the specified columns removed.'}

In [8]:
agent("shape of df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the shape of the dataframe, which will tell me the number of rows and columns.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10841, 9)[0m[32;1m[1;3m10841 rows and 9 columns
Final Answer: The shape of the dataframe is (10841, 9).[0m

[1m> Finished chain.[0m


{'input': 'shape of df', 'output': 'The shape of the dataframe is (10841, 9).'}

### Task 7: Removing Duplicate Rows.
Great progress! We were successful in dropped the columns. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore dataset.

In [9]:
# write your prompt using the agent
agent("""remove duplicate rows from 'df' while keeping the first occurrence of each duplicated row. update df.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop_duplicates() function from pandas.
Action: python_repl_ast
Action Input: df.drop_duplicates(keep='first', inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI now know the final answer
Final Answer: The final answer is that the duplicate rows have been removed from the dataframe 'df' while keeping the first occurrence of each duplicated row. The updated dataframe is now stored in the variable 'df'.[0m

[1m> Finished chain.[0m


{'input': "remove duplicate rows from 'df' while keeping the first occurrence of each duplicated row. update df.",
 'output': "The final answer is that the duplicate rows have been removed from the dataframe 'df' while keeping the first occurrence of each duplicated row. The updated dataframe is now stored in the variable 'df'."}

In [10]:
agent("shape of df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the shape of the dataframe, which will tell me the number of rows and columns.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10356, 9)[0m[32;1m[1;3m10356 rows and 9 columns
Final Answer: The shape of the dataframe is (10356, 9).[0m

[1m> Finished chain.[0m


{'input': 'shape of df', 'output': 'The shape of the dataframe is (10356, 9).'}

In [11]:
agent("number of null values in df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use a pandas function to count the null values in the dataframe
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                  0
Category             0
Rating            1465
Reviews              1
Installs             1
Type                 1
Price                1
Content Rating       1
Genres               0
dtype: int64[0m[32;1m[1;3m I now know the final answer
Final Answer: There are 1465 null values in the 'Rating' column and 1 null value in all other columns in the dataframe.[0m

[1m> Finished chain.[0m


{'input': 'number of null values in df',
 'output': "There are 1465 null values in the 'Rating' column and 1 null value in all other columns in the dataframe."}

### Task 8: Filling missing values in the DataFrame.
Good job! The duplicated rows have been successfully dropped. Let's now build a prompt to request that our agent to fill missing values in the Rating, Reviews, Price and Installs columns with the mean. Fill missing values in the Type column with the string 'Free'. Finally, fill missing values in the Content Rating column with the string 'Everyone'.

In [12]:
# write your prompt using the agent
agent("""In df, do the following:
-for null values in column 'Type', fill string 'Free',
-for null values in column 'Content Rating', fill string 'Everyone'.
Update df""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the fillna() function to fill in the null values in the specified columns.
Action: python_repl_ast
Action Input: df.fillna({'Type': 'Free', 'Content Rating': 'Everyone'}, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI now know the final answer
Final Answer: The null values in columns 'Type' and 'Content Rating' have been filled with the specified values and the dataframe has been updated.[0m

[1m> Finished chain.[0m


{'input': "In df, do the following:\n-for null values in column 'Type', fill string 'Free',\n-for null values in column 'Content Rating', fill string 'Everyone'.\nUpdate df",
 'output': "The null values in columns 'Type' and 'Content Rating' have been filled with the specified values and the dataframe has been updated."}

In [13]:
agent("number of null values in df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use a pandas function to count the null values in the dataframe
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                  0
Category             0
Rating            1465
Reviews              1
Installs             1
Type                 0
Price                1
Content Rating       0
Genres               0
dtype: int64[0m[32;1m[1;3m I now know the final answer
Final Answer: There are 1465 null values in the 'Rating' column, 1 null value in the 'Reviews' column, 1 null value in the 'Installs' column, and 1 null value in the 'Price' column.[0m

[1m> Finished chain.[0m


{'input': 'number of null values in df',
 'output': "There are 1465 null values in the 'Rating' column, 1 null value in the 'Reviews' column, 1 null value in the 'Installs' column, and 1 null value in the 'Price' column."}

In [18]:
agent("""In df, Please fill missing values in the following columns with the mean: 'Rating',
'Reviews',
'Price',
'Installs' """)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the fillna() function to fill in missing values in the specified columns.
Action: python_repl_ast
Action Input: df.fillna(df.mean(), inplace=True)[0m[36;1m[1;3m[0m



[32;1m[1;3mI now know the final answer
Final Answer: df.fillna(df.mean(), inplace=True)[0m

[1m> Finished chain.[0m


{'input': "In df, Please fill missing values in the following columns with the mean: 'Rating', \n'Reviews',\n'Price',\n'Installs' ",
 'output': 'df.fillna(df.mean(), inplace=True)'}

In [19]:
agent("number of null values in df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use a pandas function to count the null values in the dataframe
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp               0
Category          0
Rating            0
Reviews           0
Installs          0
Type              0
Price             0
Content Rating    0
Genres            0
dtype: int64[0m[32;1m[1;3mI now know the final answer
Final Answer: There are no null values in the dataframe.[0m

[1m> Finished chain.[0m


{'input': 'number of null values in df',
 'output': 'There are no null values in the dataframe.'}

### Task 9: Eliminating Data Anomalies.

Wow! We've successfully filled the missing values. Now, let's develop a prompt to tell our agent to remove the rows with Category equal to '1.9' from the DataFrame.

In [20]:
# write your prompt using the agent
agent("remove rows with 'Category' equal to  '1.9' from df. Update df.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the .loc method to select the rows with 'Category' equal to '1.9' and then use the .drop method to remove them from df.
Action: python_repl_ast
Action Input: df.loc[df['Category'] == '1.9'][0m[36;1m[1;3m                                           App Category  Rating  \
10472  Life Made WI-Fi Touchscreen Photo Frame      1.9    19.0   

             Reviews      Installs Type  Price Content Rating  \
10472  405963.442105  1.415856e+07    0  1.031       Everyone   

                  Genres  
10472  February 11, 2018  [0m[32;1m[1;3m This output looks abbreviated, so I need to use the .drop method to remove the row with index 10472 from df.
Action: python_repl_ast
Action Input: df.drop(10472, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m Now I need to check if the row was successfully removed by printing the first few rows of df.
Action: python_repl_ast
Action Input: print(df.head())[0m[36;1m[1;

{'input': "remove rows with 'Category' equal to  '1.9' from df. Update df.",
 'output': "The rows with 'Category' equal to '1.9' have been successfully removed from df."}

In [21]:
agent("shape of df")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the shape of the dataframe, which will tell me the number of rows and columns.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10355, 9)[0m[32;1m[1;3m10355 rows and 9 columns
Final Answer: The shape of the dataframe is (10355, 9).[0m

[1m> Finished chain.[0m


{'input': 'shape of df', 'output': 'The shape of the dataframe is (10355, 9).'}

### Task 10: Check App Distribution.

Fantastic! We've successfully removed the anamolies. Now, let's construct a prompt that tells our agent to count occurrences of each unique value in the Type column.

In [22]:
# write your prompt using the agent
agent("In dataframe 'df', count unique values for column 'Type' ")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the pandas function 'nunique' to count unique values in a column
Action: python_repl_ast
Action Input: df['Type'].nunique()[0m[36;1m[1;3m2[0m[32;1m[1;3m2 unique values were counted in the 'Type' column
Final Answer: 2[0m

[1m> Finished chain.[0m


{'input': "In dataframe 'df', count unique values for column 'Type' ",
 'output': '2'}

### Task 11:  Cleaning App Name.
Amazing Let's now develop a prompt to advise our helper to update the 'df' DataFrame by identifying rows where App column starts with '?'. And remove those rows in the dataframe.

In [23]:
# write your prompt using the agent
agent("In df, remove rows with value starting with '?' for column 'App'. Update df.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the pandas dataframe method 'drop' to remove rows with '?' in the 'App' column.
Action: python_repl_ast
Action Input: df.drop(df[df['App'].str.startswith('?')].index, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI now know the final answer
Final Answer: The updated dataframe with rows removed where the 'App' column starts with '?' is now stored in the variable 'df'.[0m

[1m> Finished chain.[0m


{'input': "In df, remove rows with value starting with '?' for column 'App'. Update df.",
 'output': "The updated dataframe with rows removed where the 'App' column starts with '?' is now stored in the variable 'df'."}

In [24]:
# export the cleaned dataframe 'df' with the name 'cleaned_apps_db.csv' using the parameter index=True.
agent("save the dataframe df as 'cleaned_apps_db.csv' using the parameter index=True")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the to_csv method to save the dataframe as a csv file
Action: python_repl_ast
Action Input: df.to_csv('cleaned_apps_db.csv', index=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I now know the final answer
Final Answer: The dataframe df has been saved as 'cleaned_apps_db.csv' with the index included.[0m

[1m> Finished chain.[0m


{'input': "save the dataframe df as 'cleaned_apps_db.csv' using the parameter index=True",
 'output': "The dataframe df has been saved as 'cleaned_apps_db.csv' with the index included."}

### Task 12: Display Top 5 Records for playstore_reviews.

Great! Initializing our another agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our playstore_reviews table.

In [3]:
#Load the dataset (playstore_reviews.csv) and assign it to a variable 'df1'.
df1 = pd.read_csv('playstore_reviews.csv')
# Initializing the agent1.
agent1 = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df1, verbose=True)

# write your prompt using the agent1
agent1("""Display first 5 records from df1""")

  warn_deprecated(
  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the head() function to display the first 5 records.
Action: python_repl_ast
Action Input: print(df.head())[0m[36;1m[1;3m                     App                                  Translated_Review  \
0  10 Best Foods for You  I like eat delicious food. That's I'm cooking ...   
1  10 Best Foods for You    This help eating healthy exercise regular basis   
2  10 Best Foods for You                                                NaN   
3  10 Best Foods for You         Works great especially going grocery store   
4  10 Best Foods for You                                       Best idea us   

  Sentiment  Sentiment_Polarity  Sentiment_Subjectivity  
0  Positive                1.00                0.533333  
1  Positive                0.25                0.288462  
2       NaN                 NaN                     NaN  
3  Positive                0.40                0.875000  
4  Positive                1

{'input': 'Display first 5 records from df1',
 'output': 'The first 5 records from df are displayed using the head() function.'}

### Task 13: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore_reviews dataset. This will help us understand how many rows and columns it contains.

In [4]:
# write your prompt using the agent1
agent1("shape of df1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the shape attribute of the dataframe to get the number of rows and columns
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(64295, 5)[0m[32;1m[1;3m64295 rows and 5 columns
Final Answer: The shape of df1 is (64295, 5).[0m

[1m> Finished chain.[0m


{'input': 'shape of df1', 'output': 'The shape of df1 is (64295, 5).'}

### Task 14: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the playstore_reviews table.

In [5]:
# write your prompt using the agent1
agent1("number of null values in df1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the isnull() function to check for null values, and then use the sum() function to get the total number of null values.
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                           0
Translated_Review         26868
Sentiment                 26863
Sentiment_Polarity        26863
Sentiment_Subjectivity    26863
dtype: int64[0m[32;1m[1;3m Now I need to sum up all of these values to get the total number of null values.
Action: python_repl_ast
Action Input: df.isnull().sum().sum()[0m[36;1m[1;3m107457[0m[32;1m[1;3m I now know the final answer
Final Answer: 107457[0m

[1m> Finished chain.[0m


{'input': 'number of null values in df1', 'output': '107457'}

### Task 15: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore_reviews dataset.

In [6]:
# write your prompt using the agent1
agent1("number of duplicate rows for each column in df1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the number of duplicate rows for each column in the dataframe.
Action: python_repl_ast
Action Input: df.duplicated().sum()[0m[36;1m[1;3m33616[0m[32;1m[1;3m33616 is the total number of duplicate rows in the dataframe.
Final Answer: 33616[0m

[1m> Finished chain.[0m


{'input': 'number of duplicate rows for each column in df1', 'output': '33616'}

### Task 16: Removing Missing values.
Incredible! We were successful in identify the duplicates. Now, let's create a prompt to ask our agent to drop rows with missing values from the playstore_reviews dataset.

In [7]:
# write your prompt using the agent1
agent1("""remove rows with null values in df1. update df1.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the dropna() function to remove rows with null values.
Action: python_repl_ast
Action Input: df1.dropna()[0m[36;1m[1;3mNameError: name 'df1' is not defined[0m[32;1m[1;3mI need to define df1 as the dataframe I am working with.
Action: python_repl_ast
Action Input: df1 = df[0m[36;1m[1;3m[0m[32;1m[1;3m Now I can use the dropna() function on df1.
Action: python_repl_ast
Action Input: df1.dropna()[0m[36;1m[1;3m                                  App  \
0               10 Best Foods for You   
1               10 Best Foods for You   
3               10 Best Foods for You   
4               10 Best Foods for You   
5               10 Best Foods for You   
...                               ...   
64222  Housing-Real Estate & Property   
64223  Housing-Real Estate & Property   
64226  Housing-Real Estate & Property   
64227  Housing-Real Estate & Property   
64230  Housing-Real Estate & Property   


{'input': 'remove rows with null values in df1. update df1.',
 'output': 'df1.dropna()'}

In [8]:
agent1("shape of df1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the shape attribute of the dataframe to get the number of rows and columns
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(64295, 5)[0m[32;1m[1;3m64295 rows and 5 columns
Final Answer: The shape of df1 is (64295, 5).[0m

[1m> Finished chain.[0m


{'input': 'shape of df1', 'output': 'The shape of df1 is (64295, 5).'}

### Task 17: Removing Duplicate Rows.
Great progress! We were successful in dropped the missing values. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore_reviews dataset.

In [9]:
# write your prompt using the agent1
agent1("""remove duplicate rows from 'df1' while keeping the first occurrence of each duplicated row. update df1.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop_duplicates() function from pandas.
Action: python_repl_ast
Action Input: df.drop_duplicates(keep='first', inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI now know the final answer
Final Answer: df1.drop_duplicates(keep='first', inplace=True)[0m

[1m> Finished chain.[0m


{'input': "remove duplicate rows from 'df1' while keeping the first occurrence of each duplicated row. update df1.",
 'output': "df1.drop_duplicates(keep='first', inplace=True)"}

In [10]:
agent1("shape of df1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the shape attribute of the dataframe to get the number of rows and columns
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(30679, 5)[0m[32;1m[1;3m30679 rows and 5 columns
Final Answer: The shape of df1 is (30679, 5).[0m

[1m> Finished chain.[0m


{'input': 'shape of df1', 'output': 'The shape of df1 is (30679, 5).'}

In [11]:
# export the cleaned dataframe 'df1' with the name 'cleaned_reviews_db.csv' using the parameter index=True.
agent1("save the dataframe df1 as 'cleaned_reviews_db.csv' using the parameter index=True")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the to_csv method to save the dataframe as a csv file.
Action: python_repl_ast
Action Input: df.to_csv('cleaned_reviews_db.csv', index=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I now know the final answer
Final Answer: The dataframe df1 has been saved as 'cleaned_reviews_db.csv' with the index included.[0m

[1m> Finished chain.[0m


{'input': "save the dataframe df1 as 'cleaned_reviews_db.csv' using the parameter index=True",
 'output': "The dataframe df1 has been saved as 'cleaned_reviews_db.csv' with the index included."}