# <center>MDC Capstone
## <center>Webscraping and Data Cleaning
### <center>By Joemichael Alvarez

This is a two part capstone project that incorporates webscraping and sentiment analysis. As an added bonus the use of Sk-LLM will be added to compare it to our original reviews.

_You can find the sentiment analysis portion of this project [here](https://colab.research.google.com/drive/1iGyegl0h4RyzMYWqLvkAmXjLU4bqC77T?usp=sharing)._

#Installations

In [None]:
#instillations
!pip install requests beautifulsoup4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


##Libraries

In [None]:
#libraries
import pandas as pd
import requests
from datetime import datetime
from bs4 import BeautifulSoup
import string
from google.colab import files

#quality of life
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [None]:
#requirements.txt
!pip freeze > requirements.txt

#download txt
files.download('requirements.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#Webscraping

In [None]:
#creating of dataframe
reviews_df= pd.DataFrame({'CompanyName': [], 'Review': [], 'Date': []})

#url with multiple possible companies and parse
primary_url = "https://www.consumeraffairs.com/insurance/health.html#best-rated-all"
primary_response = requests.get(primary_url)
primary_soup = BeautifulSoup(primary_response.content, "html.parser")

#for loop going through company urls in HTML
for i in primary_soup.find_all("div", {"class": "brd-card__tit-innr"}):
    link = i.find("a")["href"]
    if link[:5] == 'https': #only selects links where reviews are present
      #print(link) #verify correct company selection

      #refresh url with base review page 1 and parse
      url = link+'?page=1#sort=recent&filter=none'
      response = requests.get(url)
      soup = BeautifulSoup(response.content, 'html.parser')

      #find last page for reviews and create a variable
      last_page = soup.find_all("div", {'class': 'js-paginator-data'})[0]['data-last-page']#why an index is part of this i'll never know

      #for loop using last page as max iteration
      for j in range(int(last_page)+1):
        #ensure we dont use page 0 since it does not exist then parse
        if j != 0:
          #print(url.replace('1',str(j))) #verify pages are being shifted through
          response = requests.get(url.replace('1',str(j)))
          soup = BeautifulSoup(response.content, 'html.parser')
          reviews = soup.find_all('div', {'class': 'rvw js-rvw'})

          #for loop going through all reviews and putting them in dataframe
          for review in reviews:
            review_text = review.find('div', {'class': 'rvw-bd'}).text.strip()
            review_text = review_text.replace('Original review:', '')
            review_text = review_text.replace('Read full review', '')
            review_text = review_text.replace('Resolution response: ', '')
            review_text = review_text.replace('\n', '')
            date_string = review_text.split(", ",1)[0]+', '+(review_text.split(", ",1)[1])[:4]
            date_string = date_string.replace('Sept.', 'September'
            ).replace('Oct.', 'October'
            ).replace('Jan.', 'January'
            ).replace('Aug.', 'August'
            ).replace('Dec.', 'December'
            ).replace('Feb.', 'February'
            ).replace('Nov.', 'November')
            review_text = review_text.split(", ",1)[1]
            #initally tried a "try except" clause but datetime was not happy with that
            #opted for the kiss methodology of coding
            reviews_df = pd.concat(
                [reviews_df, pd.DataFrame(
                    {'CompanyName': link[42:-5].strip(), 'Review': [review_text[4:].strip()], 'Date': [datetime.strptime(
                        date_string.strip(), '%B %d, %Y')]})], ignore_index = True)

Let's make sure everything looks good.

In [None]:
#columns and shape
reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10767 entries, 0 to 10766
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   CompanyName  10767 non-null  object        
 1   Review       10767 non-null  object        
 2   Date         10767 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 252.5+ KB


In [None]:
#first few records
reviews_df.head()

Unnamed: 0,CompanyName,Review,Date
0,united_am,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11
1,united_am,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22
2,united_am,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28
3,united_am,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12
4,united_am,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01


In [None]:
#company names
reviews_df['CompanyName'].value_counts()

united_health_care        1953
humana                    1345
kaiser                    1261
aetna_health              1118
cigna_health              1083
anthem                     615
humana-right-source-rx     453
wellcare                   445
bluecross_fl               427
health_net                 422
united_am                  393
bluecross_ca               236
bluecross_il               151
golden_rule                136
aarp_health                131
amerihealth                128
cigna_tel_drug              88
carefirst                   83
bluecross_nj                82
ihc-health-solutions        75
bluecross_ny                55
highmark                    46
amer_rep                    33
oxford-health-plans          7
medicareenrollmentcom        1
Name: CompanyName, dtype: int64

Let's change some names around for clarity.

# Data Cleaning

In [None]:
#initally thought about a for loop but at the end of the day it would still require me to write everything out
#kiss strikes again
reviews_df_nc= pd.DataFrame(reviews_df['CompanyName'].replace('humana-right-source-rx', 'humana' #nc for name change
  ).replace('oxford-health-plans', 'united_health_care' #oxford is owned by united
  ).replace('cigna_tel_drug', 'cigna_health'
  ).replace('bluecross_fl','bluecross' #lets unify all the bluecross
  ).replace('bluecross_ca', 'bluecross'
  ).replace('bluecross_il', 'bluecross'
  ).replace('bluecross_nj', 'bluecross'
  ).replace('bluecross_ny', 'bluecross'
  ).replace('united_am', 'united_american'
  ).replace('amer_rep', 'american_republic'
  ).replace('golden_rule', 'united_health_care' #golden rule is owned by united
  ).replace('ihc-health-solutions', 'ihc_health_solutions')).join(reviews_df['Review']).join(reviews_df['Date'])

reviews_df_nc['CompanyName'].value_counts()

united_health_care       2096
humana                   1798
kaiser                   1261
cigna_health             1171
aetna_health             1118
bluecross                 951
anthem                    615
wellcare                  445
health_net                422
united_american           393
aarp_health               131
amerihealth               128
carefirst                  83
ihc_health_solutions       75
highmark                   46
american_republic          33
medicareenrollmentcom       1
Name: CompanyName, dtype: int64

Looking much better. Let's just go through all the text and make sure that

In [None]:
#clean the data of non-ascii characters
all_chars = list(string.ascii_letters + string.digits + string.punctuation + '“’”‘  ') #a bit of trial and error to get this part right

for i in range(len(reviews_df_nc)):
    clean_review = ''
    for j in range(len(reviews_df_nc.loc[i, 'Review'])): #this part was kinda annoying to come up with, chatgpt got me though
        if reviews_df_nc.loc[i, 'Review'][j] in all_chars:
            clean_review += reviews_df_nc.loc[i, 'Review'][j]
    reviews_df_nc.loc[i, 'Review'] = clean_review.strip()

reviews_df_clean = reviews_df_nc

reviews_df_clean.head()

Unnamed: 0,CompanyName,Review,Date
0,united_american,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11
1,united_american,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22
2,united_american,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28
3,united_american,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12
4,united_american,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01
...,...,...,...
10762,american_republic,"Despite a deductible of $4000 and a co-insurance of $5000, American Republic saw fit to increase my premium 56% from last year. This comes on top of annual increases since 2003 of 27% and 34%. I have never met my deductible and receive essentially nothing toward preventive care from this company. I'm tired of supporting the medical costs of people who eat at McDonald's and smoke cigarettes. I have cancelled this policy and, at 56 years old, am taking far less insurance, at one-third the cost, through AARP. Although it is not major medical, I refuse to subsidize these companies who care very little about the health (other than financial) of their clients.",2006-01-29
10763,american_republic,"I have 2 complaints against this company. We started out paying around 100.00 pr month for individual health for my husband. Within 2-3 years we got notice that it was going up to around 650.00. I called and was going to cancel when they assurred me that the increase was because this was an ancient policy. We took out another policy with them which has higher deductible and does not cover as much. The premium has more than doubled already and we have not had it a year yet. I know medical costs go up, but this is rediculous.The way I see it is that as soon as you are approved for a policy it becomes ancient.The pity of it is that my husband hardly uses it to begin with. It will be dropped the next time that I get notice of an increase! My second complaint is with their privacy issue.Last year AR was billed for charges that were supposed to go to workers comp and when I tried to tell them, I could not speak with them.My husband filled out the form giving permission for them to discuss his case, they sent him a letter to confirm that it was his signature. Before it was over he had signed 3 forms and called to verify. Last month the recovery service called me and asked all sorts of questions about past claims, yet when I called yesterday to find out the calander year date, they could not discuss anything because they could not find a permission form. We have about pulled out our hair dealing with these people and their sorry customer service.I am basically an honest person , but privacy issues go two ways. If someone makes a mistake I cannot report it because they do not have permission to speak with me. I feel that this company is price gouging to begin with - most companies give you a year before raising rates and they don't double the premium every time. They also make it so difficult to deal with them that most people give up- which probably saves on claims again. We are shopping for insurance at the moment. I refuse to hand them 1/3 of my income for nothing. I intend to tell everyone I know and to post online in every venue I can what a lousy deal you get with this company and recommend that they go elsewhere.If malpractice suits could be brought against customer service at American Republic I would be a billionaire soon.",2005-08-26
10764,american_republic,"We had insurance on both our sons, one who lives primarily in Boca Raton FL. We were assured by the salesman, Dale, that the president of the company approved this. Our son in Florida broke some ribs and went to the hospital and was informed that no doctor in south Fl was in American Republic's network. He was not able to get treatment without paying himself, although we paid premiums to American Republic each month since December. I emailed the company in early May and wrote to Mr Mikkelsen 3/31/5 and have never received a reply. Now this salesman, Dale, is calling our son in Florida and threatening him - 5 times today, 8/2/5 and he has a record of each call. The person who represents American Republic has threatened to hurt our son. What can we do?Our son never received proper treatment for three broken ribs except for the most minor of treatment which meant he was unable to work for a few weeks and we had to pay his rent, and basically support him. It is now August 2nd and he continues to have some discomfort.",2005-08-02
10765,american_republic,"First off, the reason I cancelled this company's health insurance coverage is because it DOUBLED in ONE year. I called on December 30, 2002 and spoke with Debbie from Customer Service to cancel my health insurance, THREE days before they should have taken the premium out of my account, she told me the draft had already left their office FOUR days before it was to come out of my account. I told her the money wasn't in there. I'll make a note of that, she said. Who would know they sent the draft from their office almost a WEEK before it was to come out? They could send it out of their office a month before and I wouldn't know. It is not in their policy how early they send it out. Around the 9th, I called and spoke to Vicki, a Supervisor, to see if they would reimburse me the $25 NSF fee they caused me and she told me basically that I was out of luck. Then, they tried to take it out of my account AGAIN on January 13th. Which caused the bank to charge me another $25 NSF fee. When I called again on the 15th to complain, I spoke with Melissa from Customer Service and she told me they had no record of trying to draft the money again. I faxed them the information that they HAD indeed tried and a letter, letting them know how I wanted it taken care of and they never returned my call. I called the bank and put a stop payment, which cost me ANOTHER $25, on the amount so they couldn't cause me anymore NSF fees. I called them again the next day and spoke with Marsha from Customer Service, she proceeded to tell me that wasn't their policy to refund charges they CAUSE! I am willing to eat the $25 stop payment, just so they can't continue to compound NSF fees. But I do expect to be reimubrsed by the company the $50 for NSF fees they caused. I can't help but wonder, had I been a satisfied customer calling them to ask them to delay the draft because the money would be late instead of an unhappy customer calling to cancel my health insurance, if the things would have turned out differently.It cost me $75 to cancel my health insurance coverage.",2003-01-17


Now lets ensure that our data is clean.

In [None]:
#checking for nulls
missing_data = reviews_df_clean.isnull()

#prints no missing values for all columns unless nulls exist
for column in missing_data.columns.values.tolist():
  if missing_data[column].sum() != 0:
    print(column)
    print (missing_data[column].sum())
    print("")
  else:
    print(column,"has no missing values\n")

CompanyName has no missing values

Review has no missing values

Date has no missing values



Looks good on nulls.

In [None]:
#removing duplicate values
reviews_df_clean = reviews_df_clean.drop_duplicates(keep='first')
reviews_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10767 entries, 0 to 10766
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   CompanyName  10767 non-null  object        
 1   Review       10767 non-null  object        
 2   Date         10767 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 336.5+ KB


Also looks like no duplicates were created.

# Adding Features

Let's explore the companies one by one. Companies are either publicly traded, private, or even non-profit. We also have one review for the mericareenrollment site. Since an individual review an analysis does not make, we'll be dropping it.

In [None]:
#drop individual record
reviews_df_clean = reviews_df_clean[reviews_df_clean.CompanyName != 'medicareenrollmentcom']
reviews_df_clean.head()

Unnamed: 0,CompanyName,Review,Date
0,united_american,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11
1,united_american,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22
2,united_american,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28
3,united_american,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12
4,united_american,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01
...,...,...,...
10761,american_republic,"Abusive unexplained rate increases and unexplained discount reductions on high deductible MSA/HSA plans that my agent can't explain. He says the company rep hasn't called him back.I pay my ever increasing premiums, fund my HSA's and still wind up paying paying for virtually all my own medical care. I haven't had a complete physical in years and haven't met deductible in 2 years yet my rates are spiraling out of control in apparent conflict with the rate determination process in place at the time I enrolled.",2006-08-01
10762,american_republic,"Despite a deductible of $4000 and a co-insurance of $5000, American Republic saw fit to increase my premium 56% from last year. This comes on top of annual increases since 2003 of 27% and 34%. I have never met my deductible and receive essentially nothing toward preventive care from this company. I'm tired of supporting the medical costs of people who eat at McDonald's and smoke cigarettes. I have cancelled this policy and, at 56 years old, am taking far less insurance, at one-third the cost, through AARP. Although it is not major medical, I refuse to subsidize these companies who care very little about the health (other than financial) of their clients.",2006-01-29
10763,american_republic,"I have 2 complaints against this company. We started out paying around 100.00 pr month for individual health for my husband. Within 2-3 years we got notice that it was going up to around 650.00. I called and was going to cancel when they assurred me that the increase was because this was an ancient policy. We took out another policy with them which has higher deductible and does not cover as much. The premium has more than doubled already and we have not had it a year yet. I know medical costs go up, but this is rediculous.The way I see it is that as soon as you are approved for a policy it becomes ancient.The pity of it is that my husband hardly uses it to begin with. It will be dropped the next time that I get notice of an increase! My second complaint is with their privacy issue.Last year AR was billed for charges that were supposed to go to workers comp and when I tried to tell them, I could not speak with them.My husband filled out the form giving permission for them to discuss his case, they sent him a letter to confirm that it was his signature. Before it was over he had signed 3 forms and called to verify. Last month the recovery service called me and asked all sorts of questions about past claims, yet when I called yesterday to find out the calander year date, they could not discuss anything because they could not find a permission form. We have about pulled out our hair dealing with these people and their sorry customer service.I am basically an honest person , but privacy issues go two ways. If someone makes a mistake I cannot report it because they do not have permission to speak with me. I feel that this company is price gouging to begin with - most companies give you a year before raising rates and they don't double the premium every time. They also make it so difficult to deal with them that most people give up- which probably saves on claims again. We are shopping for insurance at the moment. I refuse to hand them 1/3 of my income for nothing. I intend to tell everyone I know and to post online in every venue I can what a lousy deal you get with this company and recommend that they go elsewhere.If malpractice suits could be brought against customer service at American Republic I would be a billionaire soon.",2005-08-26
10764,american_republic,"We had insurance on both our sons, one who lives primarily in Boca Raton FL. We were assured by the salesman, Dale, that the president of the company approved this. Our son in Florida broke some ribs and went to the hospital and was informed that no doctor in south Fl was in American Republic's network. He was not able to get treatment without paying himself, although we paid premiums to American Republic each month since December. I emailed the company in early May and wrote to Mr Mikkelsen 3/31/5 and have never received a reply. Now this salesman, Dale, is calling our son in Florida and threatening him - 5 times today, 8/2/5 and he has a record of each call. The person who represents American Republic has threatened to hurt our son. What can we do?Our son never received proper treatment for three broken ribs except for the most minor of treatment which meant he was unable to work for a few weeks and we had to pay his rent, and basically support him. It is now August 2nd and he continues to have some discomfort.",2005-08-02


## Ownership Type

Let's add our new feature.

As of 05/08/2023:

* united_health_care: Public
* humana: Public
* kaiser: Private
* cigna_health: Public
* aetna_health: Public
* bluecross: Private
* anthem: Private
* wellcare: Public
* health_net: Public
* united_american: Public
* aarp_health: Non-profit
* amerihealth: Private
* carefirst: Non-profit
* ihc_health_solutions: Non-profit
* highmark: Non-profit
* american_republic: Private


In [None]:
#create ownership type(private, public, non-profit)
ownership_lst = []

for i in reviews_df_clean['CompanyName']:
  match i:
    case 'united_health_care':
      ownership_lst.append('Public')
    case 'humana':
      ownership_lst.append('Public')
    case 'cigna_health':
      ownership_lst.append('Public')
    case 'aetna_health':
      ownership_lst.append('Public')
    case 'wellcare':
      ownership_lst.append('Public')
    case 'health_net':
      ownership_lst.append('Public')
    case 'united_american':
      ownership_lst.append('Public')
    case 'kaiser':
      ownership_lst.append('Private')
    case 'bluecross':
      ownership_lst.append('Private')
    case 'anthem':
      ownership_lst.append('Private')
    case 'amerihealth':
      ownership_lst.append('Private')
    case 'american_republic':
      ownership_lst.append('Private')
    case 'aarp_health':
      ownership_lst.append('Non-Profit')
    case 'carefirst':
      ownership_lst.append('Non-Profit')
    case 'ihc_health_solutions':
      ownership_lst.append('Non-Profit')
    case 'highmark':
      ownership_lst.append('Non-Profit')
    case _:
      pass

len(ownership_lst)

10766

In [None]:
#add ownership type
reviews_df_clean['OwnershipType'] = ownership_lst

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews_df_clean['OwnershipType'] = ownership_lst


In [None]:
#new column
reviews_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10766 entries, 0 to 10765
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   CompanyName    10766 non-null  object        
 1   Review         10766 non-null  object        
 2   Date           10766 non-null  datetime64[ns]
 3   OwnershipType  10766 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 420.5+ KB


In [None]:
#preview with new column
reviews_df_clean.head()

Unnamed: 0,CompanyName,Review,Date,OwnershipType
0,united_american,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11,Public
1,united_american,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22,Public
2,united_american,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28,Public
3,united_american,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12,Public
4,united_american,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01,Public
...,...,...,...,...
10761,american_republic,"Abusive unexplained rate increases and unexplained discount reductions on high deductible MSA/HSA plans that my agent can't explain. He says the company rep hasn't called him back.I pay my ever increasing premiums, fund my HSA's and still wind up paying paying for virtually all my own medical care. I haven't had a complete physical in years and haven't met deductible in 2 years yet my rates are spiraling out of control in apparent conflict with the rate determination process in place at the time I enrolled.",2006-08-01,Private
10762,american_republic,"Despite a deductible of $4000 and a co-insurance of $5000, American Republic saw fit to increase my premium 56% from last year. This comes on top of annual increases since 2003 of 27% and 34%. I have never met my deductible and receive essentially nothing toward preventive care from this company. I'm tired of supporting the medical costs of people who eat at McDonald's and smoke cigarettes. I have cancelled this policy and, at 56 years old, am taking far less insurance, at one-third the cost, through AARP. Although it is not major medical, I refuse to subsidize these companies who care very little about the health (other than financial) of their clients.",2006-01-29,Private
10763,american_republic,"I have 2 complaints against this company. We started out paying around 100.00 pr month for individual health for my husband. Within 2-3 years we got notice that it was going up to around 650.00. I called and was going to cancel when they assurred me that the increase was because this was an ancient policy. We took out another policy with them which has higher deductible and does not cover as much. The premium has more than doubled already and we have not had it a year yet. I know medical costs go up, but this is rediculous.The way I see it is that as soon as you are approved for a policy it becomes ancient.The pity of it is that my husband hardly uses it to begin with. It will be dropped the next time that I get notice of an increase! My second complaint is with their privacy issue.Last year AR was billed for charges that were supposed to go to workers comp and when I tried to tell them, I could not speak with them.My husband filled out the form giving permission for them to discuss his case, they sent him a letter to confirm that it was his signature. Before it was over he had signed 3 forms and called to verify. Last month the recovery service called me and asked all sorts of questions about past claims, yet when I called yesterday to find out the calander year date, they could not discuss anything because they could not find a permission form. We have about pulled out our hair dealing with these people and their sorry customer service.I am basically an honest person , but privacy issues go two ways. If someone makes a mistake I cannot report it because they do not have permission to speak with me. I feel that this company is price gouging to begin with - most companies give you a year before raising rates and they don't double the premium every time. They also make it so difficult to deal with them that most people give up- which probably saves on claims again. We are shopping for insurance at the moment. I refuse to hand them 1/3 of my income for nothing. I intend to tell everyone I know and to post online in every venue I can what a lousy deal you get with this company and recommend that they go elsewhere.If malpractice suits could be brought against customer service at American Republic I would be a billionaire soon.",2005-08-26,Private
10764,american_republic,"We had insurance on both our sons, one who lives primarily in Boca Raton FL. We were assured by the salesman, Dale, that the president of the company approved this. Our son in Florida broke some ribs and went to the hospital and was informed that no doctor in south Fl was in American Republic's network. He was not able to get treatment without paying himself, although we paid premiums to American Republic each month since December. I emailed the company in early May and wrote to Mr Mikkelsen 3/31/5 and have never received a reply. Now this salesman, Dale, is calling our son in Florida and threatening him - 5 times today, 8/2/5 and he has a record of each call. The person who represents American Republic has threatened to hurt our son. What can we do?Our son never received proper treatment for three broken ribs except for the most minor of treatment which meant he was unable to work for a few weeks and we had to pay his rent, and basically support him. It is now August 2nd and he continues to have some discomfort.",2005-08-02,Private


We've done quite a bit in just a few lines. We have webscraped reviews for several healthcare companies, cleaned the reviews to not include any foriegn characters nor emojis, attached company names and combined subsidiaries, cleaned the names, attached a date for every review, and added a type of ownership based off the company.

## Review Length

Let's add review length to explore the relationship between sentiment and the size of the review.

In [None]:
#add review length as column
review_len = []

for i in range(len(reviews_df_clean)):
  review_len.append(len(reviews_df_clean.iloc[i, 1]))

reviews_df_clean['ReviewLen'] = review_len

reviews_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10766 entries, 0 to 10765
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   CompanyName     10766 non-null  object        
 1   Review          10766 non-null  object        
 2   Date            10766 non-null  datetime64[ns]
 3   OwnershipType   10766 non-null  object        
 4   ReviewLen       10766 non-null  int64         
 5   AvgSentenceLen  10766 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 588.8+ KB


In [None]:
#preview data with new column
reviews_df_clean.head()

Unnamed: 0,CompanyName,Review,Date,OwnershipType,ReviewLen,AvgSentenceLen
0,united_american,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11,Public,729,40.5
1,united_american,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22,Public,655,36.39
2,united_american,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28,Public,623,34.61
3,united_american,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12,Public,683,37.94
4,united_american,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01,Public,891,49.5


## Average Sentence Length

One other metric that might be of interest is how long the sentences on average per review. Perhaps longer sentences belong to more subjective reviews.

In [None]:
# add sentence length as column
punc_count = reviews_df_clean.iloc[i, 1].count('.')+reviews_df_clean.iloc[i, 1].count('?')+reviews_df_clean.iloc[i, 1].count('!')
avg_sentence_len = []

for i in range(len(reviews_df_clean)):
  if (punc_count) != 0:
    avg_sentence_len.append(round(review_len[i]/punc_count,2))
  else:
    avg_sentence_len.append()

reviews_df_clean['AvgSentenceLen'] = avg_sentence_len

reviews_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10766 entries, 0 to 10765
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   CompanyName     10766 non-null  object        
 1   Review          10766 non-null  object        
 2   Date            10766 non-null  datetime64[ns]
 3   OwnershipType   10766 non-null  object        
 4   ReviewLen       10766 non-null  int64         
 5   AvgSentenceLen  10766 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 588.8+ KB


In [None]:
#preview data with new column
reviews_df_clean.head()

Unnamed: 0,CompanyName,Review,Date,OwnershipType,ReviewLen,AvgSentenceLen
0,united_american,"I've been with United American Insurance for lots of years and I just pay my bill every month. Once a year, I get a letter that they're increasing their rates, which I would prefer they didn't, but I can understand that it has to. A lot of my friends have HMOs but I like that with my insurance, I can go to whichever doctor I want. I've not had one doctor say they don't accept this insurance. Whatever Medicare paid, United American Insurance paid the other 20%. I am pleased with the service I'm getting, where everything is covered and the choices are mine. Others have to go for a second opinion but I don't. If I feel I need to see a special doctor, I can make an appointment and go. So, the flexibility is worth the money.",2023-05-11,Public,729,40.5
1,united_american,"United American has been fantastic. We have had not one bit of trouble collecting or paying for anything at all. So, we're extremely happy. I've had a lot of illnesses so many times and when you had as much as I've had, you get canceled pretty quickly by other companies. But United have not questioned and they have been unbelievable. We are very fortunate to have gotten them. We pay quite a bit, but it's worth the cost. When I was very ill on sepsis and pancreatic stones, I was in the hospital for 15 days and they were no questions asked. In the last few years, I've had treatments for my breast cancer and there were no questions about that either.",2023-04-22,Public,655,36.39
2,united_american,"We've had very good luck with United American. We got it in 2001 and we have used it. My husband had open heart surgery quite a few years ago. Now, there was a discrepancy. I was just hospitalized from an accident on the farm here and we had a little problem, but it was more of the hospital. They kept saying I still owed $1,500. When we called United American, they said no, they had mailed a check. Somewhere, the check must have gotten lost, so they were going to rebill them again. But that was the only time that I had a problem. We're very satisfied with what we've got and our experience with it has been very good.",2022-10-28,Public,623,34.61
3,united_american,"We've had a United American Insurance policy for a long time, and the coverage has been excellent. It's like someone sweeping then someone picking up the filings with a magnet. It's very complete. We also get the mailings showing what has been paid. At one time, we had a rep, but I have never had to call customer service, because everything seemed to be always completed. Ever since we went on Medicare, we've enjoyed the service, and we've never had any reason to even contact anyone. UAI is fabulous, and I'd recommend it to anyone. It's not inexpensive, but when it comes right down to it, it's worth every penny. I just had some major surgery and I haven't seen a bill from it.",2022-10-12,Public,683,37.94
4,united_american,"We were with Cleveland Clinic and when they stopped taking HMO, United American Insurance was there to pick up the pieces. I've had them ever since and they've always been extremely good with me. When I asked them a question, they gave me an answer. They've been fair too and I have never had anybody question me. All this time, I've had three back surgeries and two other major surgeries. I've never had them deny a claim that I had to go back for. Sometimes United American Insurance is a little more expensive than HMO. But they have the same things as an HMO. I had to have my eyes examined and they do a refractory that insurances don't pay for. A gentleman went too and he had insurance for the eyes. He gave a card and was told, “We're sorry, we don't take that anymore because they don't pay their bills.” I have never had anyone tell me they wouldn't take United American Insurance.",2022-09-01,Public,891,49.5


# Exporting

In [None]:
#create csv
reviews_df_clean.to_csv('reviews.csv', index=False)

In [None]:
#download csv
files.download('reviews.csv')