<h1 style="color:blue;">  Scenario 12 - Part 1  </h1>

### Part 1 (Scenario 12_p1.ipynb)
- C2.S12.Py01	Clean data with TfidVectorizer for Negative True Reviews
- C2.S12.Py02	Topic modeling using NMF for Negative True Reviews
- C2.S12.Py03	Use NMF to match Type and Status for All Reviews

### Part 2 (Scenario 12_p2.ipynb)
- C2.S12.Py04	Import data and create the vectorized Train/Test Split
- C2.S12.Py05	Sentiment Analysis for Predicting Deceptive vs. True 
- C2.S12.Py06	Sentiment Analysis for Predicting Negative vs. Positive
- C2.S12.Py07	Sentiment Analysis for Negative vs. Positive for True Reviews or Deceptive Reviews
- C2.S12.Py08	Sentiment Analysis for TypeStatus
- C2.S12.Py09	Analyze New Reviews for Predicting Deceptive vs. True


<h2 style="color:blue;">Clean data with TfidVectorizer for Negative True Reviews    </h2>

In [4]:
#Code Block 1

import pandas as pd

In [6]:
#Code Block 2

url = 'Scenario12_Data/Scenario12_AllReviews.csv'
df =  pd.read_csv(url, index_col=0) 
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1600 entries, 0 to 1599
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Reviews     1600 non-null   object
 1   Type        1600 non-null   object
 2   Status      1600 non-null   object
 3   TypeStatus  1600 non-null   object
dtypes: object(4)
memory usage: 62.5+ KB


In [8]:
#Code Block 3

df_neg = df[df['TypeStatus']=='Negative True']
df_pos = df[df['TypeStatus']=='Positive True']

display(df_neg.info())
df_pos.info()

<class 'pandas.core.frame.DataFrame'>
Index: 400 entries, 0 to 399
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Reviews     400 non-null    object
 1   Type        400 non-null    object
 2   Status      400 non-null    object
 3   TypeStatus  400 non-null    object
dtypes: object(4)
memory usage: 15.6+ KB


None

<class 'pandas.core.frame.DataFrame'>
Index: 400 entries, 800 to 1199
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Reviews     400 non-null    object
 1   Type        400 non-null    object
 2   Status      400 non-null    object
 3   TypeStatus  400 non-null    object
dtypes: object(4)
memory usage: 15.6+ KB


In [10]:
#Code Block 4

df_neg.head()

Unnamed: 0,Reviews,Type,Status,TypeStatus
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True
1,The historic feel of the hotel really had a st...,NEG,True,Negative True
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True
3,I was very much looking forward to our stay at...,NEG,True,Negative True
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True


### Preprocessing

- Use TF-IDF Vectorization to create a vectorized document term matrix. 
- You may want to explore the max_df and min_df parameters.

In [13]:
#Code Block 5

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer 
from sklearn.feature_extraction.text import CountVectorizer

In [14]:
#Code Block 6

tfidf = TfidfVectorizer(max_df=0.95, min_df=2, stop_words='english', lowercase=True )

In [15]:
#Code Block 7

dtm = tfidf.fit_transform(df_neg['Reviews'])

In [19]:
#Code Block 8

tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True) 
tfidf_transformer.fit(dtm)

### IDF Score

- Inverse Document Frequency (IDF) is a weight indicating how commonly a word is used. The more frequent its usage across documents, the lower its score. The lower the score, the less important the word becomes.

In [24]:
#Code Block 9

# print idf values 
df_idf = pd.DataFrame(tfidf_transformer.idf_, index=tfidf.get_feature_names_out(),columns=["idf_weights"]) 
 
# sort ascending 
display(df_idf.sort_values(by=['idf_weights']).head(10))
df_idf.sort_values(by=['idf_weights']).tail(10)

Unnamed: 0,idf_weights
hotel,1.18282
room,1.238219
stay,1.666085
service,2.010355
chicago,2.038134
night,2.073981
staff,2.11116
stayed,2.149774
rooms,2.149774
location,2.223277


Unnamed: 0,idf_weights
rolled,5.895349
rolls,5.895349
floating,5.895349
flights,5.895349
fixable,5.895349
finish,5.895349
rough,5.895349
routing,5.895349
flushing,5.895349
lamp,5.895349


<h2 style="color:blue;">Topic modeling using NMF for Negative True Reviews    </h2>

**Topic Modeling** is an unsupervised learning approach to clustering documents, to discover topics based on their contents. It is very similar to how K-Means.

**LDA**, or Latent Derelicht Analysis is a probabilistic model, and to obtain cluster assignments, it uses two probability values: P( word | topics) and P( topics | documents). 

**Non-negative Matrix Factorization** is a Linear-algeabreic model, that factors high-dimensional vectors into a low-dimensionality representation. Similar to Principal component analysis (PCA), NMF takes advantage of the fact that the vectors are non-negative. By factoring them into the lower-dimensional form, NMF forces the coefficients to also be non-negative.

- https://medium.com/ml2vec/topic-modeling-is-an-unsupervised-learning-approach-to-clustering-documents-to-discover-topics-fdfbf30e27df

### Non-negative Matrix Factorization (3)

- Using Scikit-Learn create an instance of NMF with 3 expected components. (Use random_state=42)..

In [28]:
#Code Block 10

from sklearn.decomposition import NMF

In [30]:
#Code Block 11

nmf_model = NMF(n_components=3,random_state=42)

In [32]:
#Code Block 12

nmf_model.fit(dtm)

### Print out the top 15 most common words for each of the 3 topics.

In [38]:
#Code Block 13

for index,topic in enumerate(nmf_model.components_):
    print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
    print([tfidf.get_feature_names_out()[i] for i in topic.argsort()[-15:]])
    print('\n')

THE TOP 15 WORDS FOR TOPIC #0
['small', 'just', 'staff', 'chicago', 'nice', 'like', 'stay', 'location', 'great', 'rooms', 'bathroom', 'good', 'bed', 'hotel', 'room']


THE TOP 15 WORDS FOR TOPIC #1
['minutes', 'service', 'night', 'stay', 'asked', 'manager', 'said', 'check', 'desk', 'reservation', 'told', 'did', 'hotel', 'called', 'room']


THE TOP 15 WORDS FOR TOPIC #2
['red', 'heard', 'attempted', 'glad', 'phones', 'ran', 'scum', 'exited', 'daughter', 'wanting', 'swimming', 'phone', 'white', 'emergency', 'pool']




### Add a new column to the original quora dataframe that labels each question into one of the 3 topic categories.

In [41]:
#Code Block 14

df_neg.head()

Unnamed: 0,Reviews,Type,Status,TypeStatus
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True
1,The historic feel of the hotel really had a st...,NEG,True,Negative True
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True
3,I was very much looking forward to our stay at...,NEG,True,Negative True
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True


In [43]:
#Code Block 15

topic_results = nmf_model.transform(dtm)

In [45]:
#Code Block 16

topic_results.argmax(axis=1)

df_neg['Topic3'] = topic_results.argmax(axis=1)
display(df_neg['Topic3'].value_counts())
df_neg.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_neg['Topic3'] = topic_results.argmax(axis=1)


Topic3
0    235
1    150
2     15
Name: count, dtype: int64

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic3
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,0
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1
3,I was very much looking forward to our stay at...,NEG,True,Negative True,0
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,0
5,The Swissotel is totally understaffed and lack...,NEG,True,Negative True,0
6,Do you imagine getting there for the first tim...,NEG,True,Negative True,0
7,"We stayed here for one night, and found it a h...",NEG,True,Negative True,1
8,I want to issue a travel-warning to folks who ...,NEG,True,Negative True,0
9,Months prior to my 5-night reservation with th...,NEG,True,Negative True,1


### Non-negative Matrix Factorization (5)

- Using Scikit-Learn create an instance of NMF with 5 expected components. (Use random_state=42)..

In [48]:
#Code Block 17

nmf_model = NMF(n_components=5,random_state=42)
nmf_model.fit(dtm)



### Print out the top 15 most common words for each of the 5 topics.

In [50]:
#Code Block 18

for index,topic in enumerate(nmf_model.components_):
    print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
    print([tfidf.get_feature_names_out()[i] for i in topic.argsort()[-15:]])
    print('\n')

THE TOP 15 WORDS FOR TOPIC #0
['room', 'experience', 'lobby', 'desk', 'check', 'construction', 'great', 'rude', 'hotels', 'chicago', 'did', 'service', 'stay', 'staff', 'hotel']


THE TOP 15 WORDS FOR TOPIC #1
['check', 'night', 'minutes', 'manager', 'smoking', 'card', 'later', 'did', 'desk', 'asked', 'told', 'said', 'reservation', 'room', 'called']


THE TOP 15 WORDS FOR TOPIC #2
['red', 'heard', 'attempted', 'glad', 'phones', 'ran', 'daughter', 'exited', 'scum', 'wanting', 'swimming', 'phone', 'white', 'emergency', 'pool']


THE TOP 15 WORDS FOR TOPIC #3
['pillows', 'hyatt', 'comfortable', 'service', 'queen', 'stayed', 'small', 'day', 'sheets', 'bathroom', 'beds', 'size', 'king', 'room', 'bed']


THE TOP 15 WORDS FOR TOPIC #4
['view', 'like', 'location', 'water', 'free', 'great', 'shower', 'bathroom', 'really', 'floor', 'good', 'rooms', 'night', 'nice', 'room']




### Add a new column to the original quora dataframe that labels each question into one of the 5 topic categories.

In [53]:
#Code Block 19

df_neg.head()

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic3
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,0
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1
3,I was very much looking forward to our stay at...,NEG,True,Negative True,0
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,0


In [55]:
#Code Block 20

topic_results = nmf_model.transform(dtm)

In [57]:
#Code Block 21

topic_results.argmax(axis=1)

df_neg['Topic5'] = topic_results.argmax(axis=1)
display(df_neg['Topic5'].value_counts())
df_neg.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_neg['Topic5'] = topic_results.argmax(axis=1)


Topic5
4    124
0    100
1     92
3     74
2     10
Name: count, dtype: int64

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic3,Topic5
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,0,0
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1,0
3,I was very much looking forward to our stay at...,NEG,True,Negative True,0,0
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,0,3
5,The Swissotel is totally understaffed and lack...,NEG,True,Negative True,0,4
6,Do you imagine getting there for the first tim...,NEG,True,Negative True,0,4
7,"We stayed here for one night, and found it a h...",NEG,True,Negative True,1,0
8,I want to issue a travel-warning to folks who ...,NEG,True,Negative True,0,0
9,Months prior to my 5-night reservation with th...,NEG,True,Negative True,1,3


# Non-negative Matrix Factorization (10)

### Using Scikit-Learn create an instance of NMF with 10 expected components. (Use random_state=42)..

In [60]:
#Code Block 22

nmf_model = NMF(n_components=10,random_state=42)
nmf_model.fit(dtm)



### Print out the top 15 most common words for each of the 10 topics.

In [62]:
#Code Block 23

for index,topic in enumerate(nmf_model.components_):
    print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
    print([tfidf.get_feature_names_out()[i] for i in topic.argsort()[-15:]])
    print('\n')

THE TOP 15 WORDS FOR TOPIC #0
['construction', 'stay', 'star', 'chicago', 'hotels', 'service', 'staff', 'like', 'nice', 'rooms', 'location', 'room', 'good', 'great', 'hotel']


THE TOP 15 WORDS FOR TOPIC #1
['hour', 'ready', 'got', 'minutes', 'wasn', 'service', 'came', 'finally', 'later', 'asked', 'didn', 'said', 'desk', 'called', 'room']


THE TOP 15 WORDS FOR TOPIC #2
['covered', 'heard', 'attempted', 'glad', 'ran', 'phones', 'daughter', 'swimming', 'exited', 'scum', 'wanting', 'phone', 'white', 'emergency', 'pool']


THE TOP 15 WORDS FOR TOPIC #3
['comfortable', 'sheets', 'sleep', 'stayed', 'hotel', 'pillows', 'hyatt', 'double', 'bathroom', 'queen', 'beds', 'size', 'king', 'room', 'bed']


THE TOP 15 WORDS FOR TOPIC #4
['bothering', 'baffling', 'primarily', 'beverages', 'happy', 'coffee', 'honored', 'dc', 'fish', 'staff', 'help', 'bags', 'washington', 'monoco', 'did']


THE TOP 15 WORDS FOR TOPIC #5
['mold', 'black', 'showers', 'soap', 'bathroom', 'broken', 'holder', 'leaked', 'took

### Add a new column to the original quora dataframe that labels each question into one of the 10 topic categories.

In [65]:
#Code Block 24

df_neg.head()

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic3,Topic5
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,0,0
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1,0
3,I was very much looking forward to our stay at...,NEG,True,Negative True,0,0
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,0,3


In [67]:
#Code Block 25

topic_results = nmf_model.transform(dtm)

In [69]:
#Code Block 26

topic_results.argmax(axis=1)

df_neg['Topic10'] = topic_results.argmax(axis=1)
display(df_neg['Topic10'].value_counts())
df_neg.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_neg['Topic10'] = topic_results.argmax(axis=1)


Topic10
0    75
1    60
3    60
9    55
8    54
7    52
4    17
5    13
6     9
2     5
Name: count, dtype: int64

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic3,Topic5,Topic10
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1,1,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,0,0,9
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1,0,7
3,I was very much looking forward to our stay at...,NEG,True,Negative True,0,0,0
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,0,3,3
5,The Swissotel is totally understaffed and lack...,NEG,True,Negative True,0,4,8
6,Do you imagine getting there for the first tim...,NEG,True,Negative True,0,4,1
7,"We stayed here for one night, and found it a h...",NEG,True,Negative True,1,0,8
8,I want to issue a travel-warning to folks who ...,NEG,True,Negative True,0,0,8
9,Months prior to my 5-night reservation with th...,NEG,True,Negative True,1,3,3


In [71]:
#Code Block 27

print('')
print('-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-')
print('-=-=-=-=-=-=-=-=- Topic3 compared to Topic5 -=-=-=-=-=-=-=-=-')
display(pd.crosstab(df_neg['Topic3'], df_neg['Topic5']))
print('')
print('-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-')
print('-=-=-=-=-=-=-=-=- Topic3 compared to Topic10 -=-=-=-=-=-=-=-=-')
display(pd.crosstab(df_neg['Topic3'], df_neg['Topic10']))
print('')
print('-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-')
print('-=-=-=-=-=-=-=-=- Topic5 compared to Topic10 -=-=-=-=-=-=-=-=-')
pd.crosstab(df_neg['Topic5'], df_neg['Topic10'])


-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-
-=-=-=-=-=-=-=-=- Topic3 compared to Topic5 -=-=-=-=-=-=-=-=-


Topic5,0,1,2,3,4
Topic3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,61,0,0,58,116
1,39,92,0,14,5
2,0,0,10,2,3



-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-
-=-=-=-=-=-=-=-=- Topic3 compared to Topic10 -=-=-=-=-=-=-=-=-


Topic10,0,1,2,3,4,5,6,7,8,9
Topic3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,72,6,0,48,7,12,2,4,42,42
1,3,54,0,10,10,1,6,48,8,10
2,0,0,5,2,0,0,1,0,4,3



-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-==-
-=-=-=-=-=-=-=-=- Topic5 compared to Topic10 -=-=-=-=-=-=-=-=-


Topic10,0,1,2,3,4,5,6,7,8,9
Topic5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,33,7,0,0,12,2,1,22,9,14
1,1,45,0,0,3,0,6,28,4,5
2,0,0,5,0,0,0,1,0,2,2
3,2,2,0,57,2,1,0,1,5,4
4,39,6,0,3,0,10,1,1,34,30


<h2 style="color:blue;">   Use NMF to match Type and Status for All Reviews </h2>

### Process All Reviews to see if the categories match Type and Status

In [74]:
#Code Block 28

dtm_all = tfidf.fit_transform(df['Reviews'])

In [76]:
#Code Block 29

tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True) 
tfidf_transformer.fit(dtm_all)

### IDF Score

- Inverse Document Frequency (IDF) is a weight indicating how commonly a word is used. The more frequent its usage across documents, the lower its score. The lower the score, the less important the word becomes.

In [81]:
#Code Block 30

# print idf values 
df_idf = pd.DataFrame(tfidf_transformer.idf_, index=tfidf.get_feature_names_out(),columns=["idf_weights"]) 
 
# sort ascending 
df_idf.sort_values(by=['idf_weights'])

Unnamed: 0,idf_weights
hotel,1.163879
room,1.313625
chicago,1.502119
stay,1.596192
staff,1.871106
...,...
shaving,7.279771
sharp,7.279771
greets,7.279771
grilled,7.279771


### Non-negative Matrix Factorization (3)

 - Using Scikit-Learn create an instance of NMF with 3 expected components. (Use random_state=42)..

In [84]:
#Code Block 31

nmf_model = NMF(n_components=4,random_state=42)
nmf_model.fit(dtm_all)



#### TASK: Print our the top 15 most common words for each of the 20 topics.

In [98]:
#Code Block 32

for index,topic in enumerate(nmf_model.components_):
    print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
    print([tfidf.get_feature_names_out()[i] for i in topic.argsort()[-15:]])
    print('\n')

THE TOP 15 WORDS FOR TOPIC #0
['stayed', 'downtown', 'time', 'home', 'visit', 'amazing', 'rooms', 'beautiful', 'recommend', 'staff', 'definitely', 'place', 'stay', 'chicago', 'hotel']


THE TOP 15 WORDS FOR TOPIC #1
['said', 'hotel', 'minutes', 'finally', 'asked', 'got', 'went', 'arrived', 'reservation', 'told', 'check', 'called', 'did', 'desk', 'room']


THE TOP 15 WORDS FOR TOPIC #2
['hotel', 'large', 'shopping', 'friendly', 'helpful', 'nice', 'excellent', 'comfortable', 'michigan', 'view', 'clean', 'staff', 'room', 'location', 'great']


THE TOP 15 WORDS FOR TOPIC #3
['rooms', 'people', 'hotels', 'bathroom', 'really', 'just', 'night', 'nice', 'small', 'price', 'better', 'good', 'room', 'like', 'hotel']




#### TASK: Add a new column to the original quora dataframe that labels each question into one of the 20 topic categories.

In [90]:
#Code Block 32

topic_results = nmf_model.transform(dtm_all)

topic_results.argmax(axis=1)

df['Topic4'] = topic_results.argmax(axis=1)
display(df['Topic4'].value_counts())
df.head(20)

Topic4
3    470
1    422
2    358
0    350
Name: count, dtype: int64

Unnamed: 0,Reviews,Type,Status,TypeStatus,Topic4
0,My wife and I just spent a long weekend at the...,NEG,True,Negative True,1
1,The historic feel of the hotel really had a st...,NEG,True,Negative True,3
2,I haven't actually stayed at this hotel- yet- ...,NEG,True,Negative True,1
3,I was very much looking forward to our stay at...,NEG,True,Negative True,3
4,The hotel is almost always very helpful. This ...,NEG,True,Negative True,1
5,The Swissotel is totally understaffed and lack...,NEG,True,Negative True,3
6,Do you imagine getting there for the first tim...,NEG,True,Negative True,1
7,"We stayed here for one night, and found it a h...",NEG,True,Negative True,3
8,I want to issue a travel-warning to folks who ...,NEG,True,Negative True,3
9,Months prior to my 5-night reservation with th...,NEG,True,Negative True,1


In [92]:
#Code Block 33

pd.crosstab(df['TypeStatus'], df['Topic4'])

Topic4,0,1,2,3
TypeStatus,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Negative Deceptive,31,224,1,144
Negative True,5,159,23,213
Positive Deceptive,268,17,78,37
Positive True,46,22,256,76


In [94]:
#Code Block 34

pd.crosstab(df['Type'], df['Topic4'])

Topic4,0,1,2,3
Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NEG,36,383,24,357
POS,314,39,334,113
