Univariate non-graphical analysis in data science involves using statistical techniques to explore and understand the characteristics of a single variable in your data set. Here's a breakdown of the key methods:

Measures of Central Tendency:

These measures describe the "center" or average value of your data and indicate where most of the data points are concentrated. Common measures include:

Mean (Arithmetic Average): The sum of all values divided by the number of values. It's a good indicator of the central tendency if the data is symmetrical (normal distribution).
Median: The middle value when the data is ordered from least to greatest. It's less sensitive to outliers compared to the mean.
Mode: The most frequent value in the data set. It can be useful for identifying the most common category in categorical data.
Measures of Spread (Variability):

These measures tell you how spread out the data points are from the central tendency. Common measures include:

Range: The difference between the highest and lowest values in the data set. It can be sensitive to outliers.
Variance: The average squared deviation of each data point from the mean. It's sensitive to outliers.
Standard Deviation (STD): The square root of the variance. It represents the typical distance of a data point from the mean, expressed in the same units as the original data. It's less sensitive to outliers than variance.
Measures of Shape (Optional):

These measures provide insights into the distribution shape of your data (symmetrical, skewed, etc.). Common measures include:

Skewness: A statistical measure that quantifies the asymmetry of a distribution. A positive value indicates a positive skew (tail towards the right), and a negative value indicates a negative skew (tail towards the left).
Kurtosis: A measure of how peaked or flat a distribution is compared to a normal distribution.
How to Perform Non-Graphical Univariate Analysis:

There are several ways to perform non-graphical univariate analysis, depending on your tools and data format:

Spreadsheets: Most spreadsheet software (like Microsoft Excel or Google Sheets) has built-in functions to calculate these measures.
Statistical Software: Statistical software packages like R, Python (with libraries like NumPy and Pandas), or specialized data analysis tools offer more extensive functionalities for data exploration and analysis.
Online Calculators: There are online calculators available that can compute these measures if you have a limited dataset.
Benefits of Non-Graphical Univariate Analysis:

Summarizes data: Provides a concise numerical summary of a single variable.
Identifies outliers: Certain measures can help identify potential outliers in your data.
Compares data sets: You can use these measures to compare the central tendency and variability of data from different sources.
Foundation for further analysis: These measures provide a starting point for further analysis with techniques like hypothesis testing or visualizations.

In [5]:
import pandas as pd

In [6]:
df=pd.read_csv('Univariate Analysis non graphical.csv')

In [8]:
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,19-10-2018,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,21-05-2019,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,05-07-2019,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,19-11-2018,0.10,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48901,5441,Central Manhattan/near Broadway,7989,Kate,Manhattan,Hell's Kitchen,40.76076,-73.98867,Private room,85,2,188,23-06-2019,1.50,1,39
48902,5803,"Lovely Room 1, Garden, Best Area, Legal rental",9744,Laurie,Brooklyn,South Slope,40.66829,-73.98779,Private room,89,4,167,24-06-2019,1.34,3,314
48903,6021,Wonderful Guest Bedroom in Manhattan for SINGLES,11528,Claudio,Manhattan,Upper West Side,40.79826,-73.96113,Private room,85,2,113,05-07-2019,0.91,1,333
48904,6090,West Village Nest - Superhost,11975,Alina,Manhattan,West Village,40.73530,-74.00525,Entire home/apt,120,90,27,31-10-2018,0.22,1,0


In [19]:
#data cleaning
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)   

In [20]:
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,19-10-2018,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,21-05-2019,0.38,2,355
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,05-07-2019,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,19-11-2018,0.10,1,0
5,5099,Large Cozy 1 BR Apartment In Midtown East,7322,Chris,Manhattan,Murray Hill,40.74767,-73.975,Entire home/apt,200,3,74,22-06-2019,0.59,1,129
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48782,36425863,Lovely Privet Bedroom with Privet Restroom,83554966,Rusaa,Manhattan,Upper East Side,40.78099,-73.95366,Private room,129,1,1,07-07-2019,1.00,1,147
48790,36427429,No.2 with queen size bed,257683179,H Ai,Queens,Flushing,40.75104,-73.81459,Private room,45,1,1,07-07-2019,1.00,6,339
48799,36438336,Seas The Moment,211644523,Ben,Staten Island,Great Kills,40.54179,-74.14275,Private room,235,1,1,07-07-2019,1.00,1,87
48805,36442252,1B-1B apartment near by Metro,273841667,Blaine,Bronx,Mott Haven,40.80787,-73.924,Entire home/apt,100,1,2,07-07-2019,2.00,1,40


In [16]:
df.info()


<class 'pandas.core.frame.DataFrame'>
Index: 38821 entries, 0 to 48852
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              38821 non-null  int64  
 1   name                            38821 non-null  object 
 2   host_id                         38821 non-null  int64  
 3   host_name                       38821 non-null  object 
 4   neighbourhood_group             38821 non-null  object 
 5   neighbourhood                   38821 non-null  object 
 6   latitude                        38821 non-null  float64
 7   longitude                       38821 non-null  float64
 8   room_type                       38821 non-null  object 
 9   price                           38821 non-null  int64  
 10  minimum_nights                  38821 non-null  int64  
 11  number_of_reviews               38821 non-null  int64  
 12  last_review                     38821

In [17]:
#converting the type as per requirements

df['id']=df['id'].astype(str)
df['host_id']=df['host_id'].astype(str)
df['latitude']=df['latitude'].astype(str)
df['longitude']=df['longitude'].astype(str)

In [21]:
df.describe()

Unnamed: 0,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,38821.0,38821.0,38821.0,38821.0,38821.0,38821.0
mean,142.332526,5.86922,29.290255,1.373229,5.166611,114.886299
std,196.994756,17.389026,48.1829,1.680328,26.302954,129.52995
min,0.0,1.0,1.0,0.01,1.0,0.0
25%,69.0,1.0,3.0,0.19,1.0,0.0
50%,101.0,2.0,9.0,0.72,1.0,55.0
75%,170.0,4.0,33.0,2.02,2.0,229.0
max,10000.0,1250.0,629.0,58.5,327.0,365.0


In [26]:
df.nunique()

id                                38821
name                              38244
host_id                           30232
host_name                          9885
neighbourhood_group                   5
neighbourhood                       218
latitude                          17436
longitude                         13639
room_type                             3
price                               581
minimum_nights                       89
number_of_reviews                   393
last_review                        1764
reviews_per_month                   937
calculated_host_listings_count       47
availability_365                    366
dtype: int64

### Categorical:

##### here the `neighbourhood group` is categorical column, so we will approach it first using non graphical analysis
`Explore unique values:` Categorical columns often have a limited number of unique values compared to numerical columns. You can check for columns with a relatively low number of unique values.

In [29]:
df['neighbourhood_group'].value_counts()   #returns the numbers of hotels exists in a particular city

neighbourhood_group
Manhattan        16621
Brooklyn         16439
Queens            4572
Bronx              875
Staten Island      314
Name: count, dtype: int64

In [30]:
df['neighbourhood_group'].value_counts(normalize=True)  #normalize=True, the counts are converted to proportions (adding up to 1) representing the percentage of each value within the data.

neighbourhood_group
Manhattan        0.428145
Brooklyn         0.423456
Queens           0.117771
Bronx            0.022539
Staten Island    0.008088
Name: proportion, dtype: float64

#### `room_type` column is also categorical


In [31]:
df['room_type'].value_counts() 

room_type
Entire home/apt    20321
Private room       17654
Shared room          846
Name: count, dtype: int64

In [33]:
df['room_type'].value_counts(normalize=True) 

room_type
Entire home/apt    0.523454
Private room       0.454754
Shared room        0.021792
Name: proportion, dtype: float64

#### `neighbourhood` is also categorical

In [35]:
df['neighbourhood'].value_counts() 

neighbourhood
Williamsburg          3163
Bedford-Stuyvesant    3141
Harlem                2204
Bushwick              1942
Hell's Kitchen        1528
                      ... 
Holliswood               2
New Dorp Beach           2
Richmondtown             1
Rossville                1
Willowbrook              1
Name: count, Length: 218, dtype: int64

In [39]:
df['neighbourhood'].value_counts().reset_index()

Unnamed: 0,neighbourhood,count
0,Williamsburg,3163
1,Bedford-Stuyvesant,3141
2,Harlem,2204
3,Bushwick,1942
4,Hell's Kitchen,1528
...,...,...
213,Holliswood,2
214,New Dorp Beach,2
215,Richmondtown,1
216,Rossville,1


In [43]:
df_n=df['neighbourhood'].value_counts().reset_index().rename(columns= {'index' : 'neighbourhood', 'count':'No_of_hotels'})

In [45]:
df_n

Unnamed: 0,neighbourhood,No_of_hotels
0,Williamsburg,3163
1,Bedford-Stuyvesant,3141
2,Harlem,2204
3,Bushwick,1942
4,Hell's Kitchen,1528
...,...,...
213,Holliswood,2
214,New Dorp Beach,2
215,Richmondtown,1
216,Rossville,1


In [47]:
df_n['No_of_hotels']>1000

0       True
1       True
2       True
3       True
4       True
       ...  
213    False
214    False
215    False
216    False
217    False
Name: No_of_hotels, Length: 218, dtype: bool

In [49]:
df_n[df_n['No_of_hotels']>1000]

Unnamed: 0,neighbourhood,No_of_hotels
0,Williamsburg,3163
1,Bedford-Stuyvesant,3141
2,Harlem,2204
3,Bushwick,1942
4,Hell's Kitchen,1528
5,East Village,1489
6,Upper West Side,1482
7,Upper East Side,1405
8,Crown Heights,1265


### Numerical:

#### from the given dataset we can say `price`, `minimum_nights`, `number_of_reviews` are numerical columns

the `bins` parameter specifies the number of bins (intervals) used to group your data for visualization. Here bins=5 divided price data into 5 groups where it shows price of -10.001 to 2000.0 available in 38786 hotels similary 6000.0 to 8000.0 price available for 3 hotels.

In [51]:
df['price'].value_counts(bins=5)  

(-10.001, 2000.0]    38786
(2000.0, 4000.0]        20
(4000.0, 6000.0]         8
(8000.0, 10000.0]        5
(6000.0, 8000.0]         2
Name: count, dtype: int64

In [55]:
#as bins have many outliers (-10 price? - negative price). creating a bin where there are no outliers or negative numbers

bins=(0,50,100,200,500,2000,10000)


In [57]:
df['price'].value_counts(bins=bins)  

(50.0, 100.0]        14212
(100.0, 200.0]       13544
(200.0, 500.0]        5267
(-0.001, 50.0]        5176
(500.0, 2000.0]        587
(2000.0, 10000.0]       35
Name: count, dtype: int64

In [60]:
df['price'].mean()

142.33252621004095

In [62]:
df['price'].std()

196.99475591833985

In [64]:
df['price'].skew()

23.673594295123014

In [69]:
df['price'].kurt()

953.4807356344944

#### Correlation refers to the statistical relationship between two variables. The correlation value, often denoted by the letter r, is a numerical measure that quantifies the strength and direction of that relationship. It ranges from -1 to +1, with interpretations as follows:

##### - Positive correlation (0 < r <= 1): As the value of one variable increases, the value of the other variable also tends to increase. The stronger the positive correlation, the closer r is to 1.

##### - Negative correlation (-1 <= r < 0): As the value of one variable increases, the value of the other variable tends to decrease. The stronger the negative correlation, the closer r is to -1.

##### - Zero correlation (r = 0): There's no statistically significant linear relationship between the two variables. Changes in one variable don't consistently predict changes in the other.

In [68]:
df.corr()

ValueError: could not convert string to float: 'Clean & quiet apt home by the park'