## Reading stata files

The `read_stata()` function is used to load data from stata files into a Pandas dataframe. We just need to import the`pandas` library and specify the filepath of our data.

In [None]:
import pandas as pd

In [None]:
empresas = pd.read_stata('BID2.EMPRESAS.dta')

## Exploring the dataframe
`.head()` is used to explore the first entries. If `()` is left in blank, it returns the first 5 entries.

`.tail()` is used to explore the last entries. `.tail(3)` will show the last 3 entries.


In [None]:
empresas.head()

`.info()` details numbers of entries, columns, and data types

In [24]:
empresas.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 193551 entries, 0 to 193550
Data columns (total 13 columns):
id                        193551 non-null int32
id_empresa                193551 non-null float64
tipo_persona              193551 non-null object
tipo_sociedad             193551 non-null object
descripcion_empresa       193551 non-null object
actividad                 193551 non-null object
tipo_empleador            193551 non-null object
como_se_entero_del_sne    193551 non-null object
entidadfed_sne            193551 non-null object
municipio                 193551 non-null object
localidad                 193551 non-null object
codigo_postal             193551 non-null object
fuente                    193551 non-null object
dtypes: float64(1), int32(1), object(11)
memory usage: 19.9+ MB


`.describe()` generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

In [None]:
empresas.describe()

`.nunique()` Counts distinct observations over requested axis.

`.unique()` Uniques are returned in order of appearance.

In [None]:
empresas.nunique()

In [None]:
empresas['tipo_sociedad'].unique()

## Writing CSV Files

In [None]:
empresas.to_csv('empresas.csv')

Let's see if that worked the way we wanted.

In [None]:
df = pd.read_csv('empresas.csv')
df.head()

What's this `Unnamed:0`? `to_csv()` will store our index unless we tell it not to. To make it ignore the index, we have to provide the parameter `index=False`

In [None]:
empresas.to_csv('empresas.csv',index=False)

In [None]:
df = pd.read_csv('empresas.csv')
df.head()