# Exploring imported data

How to quickly know what we have in our imported data

---

- Number of rows - use `len(objectname)`
```python
len(stats) = 195
```
Tip: When importing data, add a comment on the length e.g. 195 rows imported. This gives you traceability.  
    This informs you as to the problems that occured during the import. Were you able to import whole data, which contains 195 rows.

+ What are your column names - use `objectname.columns`

```python
stats.columns 

#output
Index(['Country Name', 'Country Code', 'Birth rate', 'Internet users',
       'Income Group'],
      dtype='object')
```

---

* Count number of columns - use `len(objectname.columns)`

```python
len(stats.columns) = 5

#You have 5 column variables
```

**Note**- stats.columns itself is an object that is part of a data frame. These structures are built-in the object.

___

* Check to see all columns are of equal length - use `objectname.count()`

```python
stats.count()

#output
Country Name      195
Country Code      195
Birth rate        195
Internet users    195
Income Group      195
dtype: int64
```

---

+ How to see the top rows of data set - use `objectname.head()`

```python
stats.head()

#This will give you first 5 rows.

#Inorder to get more than 5 rows, you can specify rows within its parenthesis.
stats.head(10) 
```

---

- How to see the last rows of your data set- use `objectname.tail()`

```python
#To get the last five rows
stats.tail()

#to get more than last five rows
stats.tail(10)
```

___
```
__Note__- head() and tail() are methods inside the object.
```
___

- Get some information on the columns - use `objectname.info`  **Important**

*This is like an str() function in R. str() is bit more powerful as it mentions categorical variables (factors) as well

```python
stats.info()

#output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    195 non-null    object 
 1   Country Code    195 non-null    object 
 2   Birth rate      195 non-null    float64
 3   Internet users  195 non-null    float64
 4   Income Group    195 non-null    object 
dtypes: float64(2), object(3)
memory usage: 7.7+ KB
```
This gives you number of rows (RangeIndex), column names, number of rows in each column, **object type**.  
_Strings are object data type_  

**non-null** - means no empty cells in a row.

___

+ Get stats on the columns - use `objectname.describe()`  __Important__

*This is similar to the summary() in R.  

```python
stats.describe()

#output
	Birth rate	Internet users
count	195.000000	195.000000
mean	21.469928	42.076471
std	10.605467	29.030788
min	7.900000	0.900000
25%	12.120500	14.520000
50%	19.680000	41.000000
75%	29.759500	66.225000
max	49.661000	96.546800
```

This gives you the max, mean, std, min, quartiles (medians) for numerical columns.

Before you calcualte the median, you arrange the numbers in ascending manner.  
Medians are middle values

25% - median of first 25%

50% - median of whole data

75% - median between middle value and the last value.

___

- How to transpose - use `objectname.describe().transpose()`
```python
stats.describe().transpose()

#output

count	mean	std	min	25%	50%	75%	max
Birth rate	195.0	21.469928	10.605467	7.9	12.1205	19.68	29.7595	49.6610
Internet users	195.0	42.076471	29.030788	0.9	14.5200	41.00	66.2250	96.5468
```

**Notice** - stats.describe() is an object here that you are transposing

What is transposing?  
Converting columns into rows and vice-versa.


In [15]:
#Revision 

import pandas as pd
import os

print(os.getcwd())
#importing csv file into python

pd.read_csv('demographic.csv')

#put this file into an object

stats = pd.read_csv('demographic.csv')

#print that object
stats

/Users/rajanbawa/Documents/Python


Unnamed: 0,Country Name,Country Code,Birth rate,Internet users,Income Group
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income
...,...,...,...,...,...
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.850,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income


In [16]:
#How to do the inital exploration of the data that you have imported

#number of rows
len(stats)

195

In [17]:
#column names
stats.columns

Index(['Country Name', 'Country Code', 'Birth rate', 'Internet users',
       'Income Group'],
      dtype='object')

In [18]:
#number of columns
len(stats.columns)

5

In [22]:
stats.count()

Country Name      195
Country Code      195
Birth rate        195
Internet users    195
Income Group      195
dtype: int64

In [24]:
#get the top rows
stats.head(10)

Unnamed: 0,Country Name,Country Code,Birth rate,Internet users,Income Group
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income
5,Argentina,ARG,17.716,59.9,High income
6,Armenia,ARM,13.308,41.9,Lower middle income
7,Antigua and Barbuda,ATG,16.447,63.4,High income
8,Australia,AUS,13.2,83.0,High income
9,Austria,AUT,9.4,80.6188,High income


In [25]:
stats.tail()

Unnamed: 0,Country Name,Country Code,Birth rate,Internet users,Income Group
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.85,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income
194,Zimbabwe,ZWE,35.715,18.5,Low income


In [26]:
stats.tail(10)

Unnamed: 0,Country Name,Country Code,Birth rate,Internet users,Income Group
185,Virgin Islands (U.S.),VIR,10.7,45.3,High income
186,Vietnam,VNM,15.537,43.9,Lower middle income
187,Vanuatu,VUT,26.739,11.3,Lower middle income
188,West Bank and Gaza,PSE,30.394,46.6,Lower middle income
189,Samoa,WSM,26.172,15.3,Lower middle income
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.85,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income
194,Zimbabwe,ZWE,35.715,18.5,Low income


In [29]:
stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    195 non-null    object 
 1   Country Code    195 non-null    object 
 2   Birth rate      195 non-null    float64
 3   Internet users  195 non-null    float64
 4   Income Group    195 non-null    object 
dtypes: float64(2), object(3)
memory usage: 7.7+ KB


In [31]:
stats.describe()

Unnamed: 0,Birth rate,Internet users
count,195.0,195.0
mean,21.469928,42.076471
std,10.605467,29.030788
min,7.9,0.9
25%,12.1205,14.52
50%,19.68,41.0
75%,29.7595,66.225
max,49.661,96.5468


In [32]:
stats.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Birth rate,195.0,21.469928,10.605467,7.9,12.1205,19.68,29.7595,49.661
Internet users,195.0,42.076471,29.030788,0.9,14.52,41.0,66.225,96.5468


In [43]:
len(stats)#number of rows
stats.columns
stats.count()
len(stats.columns)
stats.info() #str
stats.describe()#summary
stats.transpose()
stats.head(7)
stats.tail(7)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    195 non-null    object 
 1   Country Code    195 non-null    object 
 2   Birth rate      195 non-null    float64
 3   Internet users  195 non-null    float64
 4   Income Group    195 non-null    object 
dtypes: float64(2), object(3)
memory usage: 7.7+ KB


Unnamed: 0,Country Name,Country Code,Birth rate,Internet users,Income Group
188,West Bank and Gaza,PSE,30.394,46.6,Lower middle income
189,Samoa,WSM,26.172,15.3,Lower middle income
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.85,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income
194,Zimbabwe,ZWE,35.715,18.5,Low income
