# Problem

### Write a Pandas program to read the csv file **diamonds.csv** from a specified source and print the first 10 rows.<br>
The csv file has the following columns.<br/><br/>

| Column Name | Description |
| :- | :- |
| price	| price in US dollars (\$326--\$18,823) |
| carat	| weight of the diamond (0.2--5.01) |
| cut |quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
| color | diamond colour, from J (worst) to D (best) |
| clarity | a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
| x | length in mm (0--10.74) |
| y	| width in mm (0--58.9) |
| z	| depth in mm (0--31.8) |
| depth	| total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) |
| table	| width of top of diamond relative to widest point (43--95) |

In [61]:
import pandas as pd

#pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)

diamonds = pd.read_csv('diamonds.csv')

print("First 10 rows:")
print(diamonds.head(10))

First 10 rows:
   carat        cut color clarity  depth  table  price     x     y     z
0   0.23      Ideal     E     SI2   61.5   55.0  326.0  3.95  3.98  2.43
1   0.21    Premium     E     SI1   59.8   61.0  326.0  3.89  3.84  2.31
2   0.23       Good     E     VS1   56.9   65.0  327.0  4.05  4.07  2.31
3   0.29    Premium     I     NaN   62.4   58.0  334.0  4.20  4.23  2.63
4   0.31       Good     J     SI2   63.3   58.0  335.0  4.34  4.35  2.75
5   0.24        NaN     J    VVS2   62.8   57.0  336.0  3.94  3.96  2.48
6   0.24  Very Good     I    VVS1   62.3   57.0  336.0  3.95  3.98  2.47
7   0.26  Very Good     H     SI1   61.9   55.0  337.0  4.07  4.11  2.53
8   0.22       Fair     E     VS2   65.1   61.0  337.0  3.87  3.78  2.49
9   0.23  Very Good     H     VS1   59.4   61.0  338.0  4.00  4.05  2.39


### Write a Pandas program to find the number of rows and columns of the diamonds Dataframe.



In [62]:
rows = len(diamonds.index)
columns = len(diamonds.columns)
print("Num of rows:"+str(rows))
print("Num of columns:"+str(columns))

Num of rows:53940
Num of columns:10


### Write a Pandas program to find the data type of each column of the diamonds Dataframe.

In [63]:
diamonds.dtypes

carat      float64
cut         object
color       object
clarity     object
depth      float64
table      float64
price      float64
x          float64
y          float64
z          float64
dtype: object

### Write a Pandas program to create a new 'quality-color' columns that concatenate the data from the 'cut' column with the data from the 'color' column.
### Eg, if 'cut' is 'Ideal' and 'color is 'E', 'quality-color' is 'Ideal E'

In [64]:
diamonds['quality-color'] = diamonds['cut'] + ' ' + diamonds['color']
diamonds.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z,quality-color
0,0.23,Ideal,E,SI2,61.5,55.0,326.0,3.95,3.98,2.43,Ideal E
1,0.21,Premium,E,SI1,59.8,61.0,326.0,3.89,3.84,2.31,Premium E
2,0.23,Good,E,VS1,56.9,65.0,327.0,4.05,4.07,2.31,Good E
3,0.29,Premium,I,,62.4,58.0,334.0,4.2,4.23,2.63,Premium I
4,0.31,Good,J,SI2,63.3,58.0,335.0,4.34,4.35,2.75,Good J


### Now that we have the quality-color column, we do not need the cut column.
### Write a Pandas program to remove the 'cut' column of the diamonds Dataframe.



In [65]:
diamonds = diamonds.drop("cut",axis=1)
diamonds.head()

Unnamed: 0,carat,color,clarity,depth,table,price,x,y,z,quality-color
0,0.23,E,SI2,61.5,55.0,326.0,3.95,3.98,2.43,Ideal E
1,0.21,E,SI1,59.8,61.0,326.0,3.89,3.84,2.31,Premium E
2,0.23,E,VS1,56.9,65.0,327.0,4.05,4.07,2.31,Good E
3,0.29,I,,62.4,58.0,334.0,4.2,4.23,2.63,Premium I
4,0.31,J,SI2,63.3,58.0,335.0,4.34,4.35,2.75,Good J


### Write a Pandas program to sort by 'color' in ascending order

In [66]:
diamonds.sort_values(['color'], ascending = True)

Unnamed: 0,carat,color,clarity,depth,table,price,x,y,z,quality-color
53939,0.75,D,SI2,62.2,55.0,2757.0,5.83,,3.64,Ideal D
7817,1.00,D,SI2,63.5,59.0,4295.0,6.35,6.32,4.02,Very Good D
7816,1.00,D,SI2,57.8,58.0,4295.0,6.61,6.55,3.80,Good D
7815,1.00,D,SI2,61.5,63.0,4295.0,6.32,6.27,3.87,Very Good D
7809,0.56,D,IF,61.9,57.0,4293.0,5.28,5.31,3.28,Ideal D
...,...,...,...,...,...,...,...,...,...,...
4861,1.00,J,SI1,60.8,58.0,3712.0,6.39,6.44,3.90,Premium J
18421,1.52,J,SI1,61.9,57.0,7491.0,7.37,7.33,4.55,Ideal J
18423,1.50,J,VS2,62.6,58.0,7492.0,7.25,7.29,4.55,Very Good J
42108,0.65,J,SI1,61.4,55.0,1276.0,5.58,5.62,3.44,Ideal J


### Write the Pandas program to drop a row if any or all values in a row are missing. Print the DataFrame before and aftet dropping the missing rows and compare.


In [67]:
diamonds.head(50)


diamonds.dropna(axis=0, how='any', inplace=True)

diamonds.head(50)

Unnamed: 0,carat,color,clarity,depth,table,price,x,y,z,quality-color
0,0.23,E,SI2,61.5,55.0,326.0,3.95,3.98,2.43,Ideal E
1,0.21,E,SI1,59.8,61.0,326.0,3.89,3.84,2.31,Premium E
2,0.23,E,VS1,56.9,65.0,327.0,4.05,4.07,2.31,Good E
4,0.31,J,SI2,63.3,58.0,335.0,4.34,4.35,2.75,Good J
6,0.24,I,VVS1,62.3,57.0,336.0,3.95,3.98,2.47,Very Good I
7,0.26,H,SI1,61.9,55.0,337.0,4.07,4.11,2.53,Very Good H
8,0.22,E,VS2,65.1,61.0,337.0,3.87,3.78,2.49,Fair E
9,0.23,H,VS1,59.4,61.0,338.0,4.0,4.05,2.39,Very Good H
10,0.3,J,SI1,64.0,55.0,339.0,4.25,4.28,2.73,Good J
11,0.23,J,VS1,62.8,56.0,340.0,3.93,3.9,2.46,Ideal J


### Write the Pandas program to show the data of the diamonds where length>5, width>5 and depth>5

In [68]:
diamonds[(diamonds['x'] > 5) & (diamonds['y'] > 5) & (diamonds['z'] > 5)]

Unnamed: 0,carat,color,clarity,depth,table,price,x,y,z,quality-color
11778,1.83,J,I1,70.0,58.0,5083.0,7.34,7.28,5.12,Fair J
13002,2.14,J,I1,69.4,57.0,5405.0,7.74,7.70,5.36,Fair J
13118,2.15,J,I1,65.5,57.0,5430.0,8.01,7.95,5.23,Fair J
13562,1.96,F,I1,66.6,60.0,5554.0,7.59,7.56,5.04,Fair F
13757,2.22,J,I1,66.7,56.0,5607.0,8.04,8.02,5.36,Fair J
...,...,...,...,...,...,...,...,...,...,...
27748,2.00,G,SI1,63.5,56.0,18818.0,7.90,7.97,5.04,Very Good G
27749,2.29,I,VS2,60.8,60.0,18823.0,8.50,8.47,5.16,Premium I
48410,0.51,E,VS1,61.8,54.7,1970.0,5.12,5.15,31.80,Very Good E
49189,0.51,E,VS1,61.8,55.0,2075.0,5.15,31.80,5.12,Ideal E
