In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
fruits = pd.read_table('fruit_data_with_colors.txt')
fruits.head()

## Dataset Analysis
To be able to view a broader version of the dataset, lets print the dimensions of the set to see what we're working with

In [None]:
print(fruits.shape) # Dataframe size
print(fruits['fruit_name'].unique()) # Unique fruit_names in the dataframe
print(fruits.groupby('fruit_name').size()) # Total number of each type of fruit

We can see that we have 
- 59 total rows from the table, and 7 unique features/attributes for each row
- 4 unique fruit names (apple, mandarin, orange, lemon)
- How many of each type of fruit we have

### Let's next try to graph these results using seaborn:

In [None]:
import seaborn as sns
sns.countplot(data=fruits, x='fruit_name')
plt.show()

## Visualization
Let's create a few different charts for the numeric attributes of the fruits (mass, width, height, color_score)

### Box Plots

In [None]:
fruits.drop('fruit_label', axis=1).plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False, figsize=(9,9), title='Box Plot for each numeric attribute')
plt.savefig('fruits_box')
plt.show()

### Histogram

In [None]:
import pylab as pl
fruits.drop('fruit_label' ,axis=1).hist(bins=30, figsize=(9,9))
pl.suptitle("Histogram for each numeric attribute")
plt.savefig('fruits_hist')
plt.show()

Some pairs of attribute are correlated (like mass and width). With this in mind, its suggest correlation between attributes of fruits. Using this information, we can start seeing predictable relationships between a given fruits attribute values.

In [None]:
from pandas.plotting import scatter_matrix
from matplotlib import cm
feature_names = ['mass', 'width', 'height', 'color_score']
X = fruits[feature_names]
y = fruits['fruit_label']
cmap = cm.get_cmap('gnuplot')
scatter = pd.plotting.scatter_matrix(X, c = y, marker = 'o', s=40, hist_kwds={'bins':15}, figsize=(9,9), cmap = cmap)
plt.suptitle('Scatter-matrix for each numeric attribute')
plt.savefig('fruits_scatter_matrix')