---
#image: dataloading.png
title: Modeling libries in python
subtitle: Based on data wrangling and model fitting and scoring
date: '2024-03-01'
categories: [Python, Pandas, time_series]
author: Kunal Khurana
jupyter: python3
toc: True
---

## Interfacing between pandas and model code

## Creating model descriptions with patsy
- Data Transformations in Pasty Formulas
- Categorical data and patsy

## Introduction to statsmodels
- Estimating linear models
- Estimating time series processes


## Introduction to scitkit-learn




## Interface (data loading and cleaning beforing model building)

In [2]:
import pandas as pd
import numpy as np

In [3]:
data = pd.DataFrame({
    "x" :[1, 2, 3, 4, 5],
    "y" :[0.1, .2, 0.4, .6, .7],
    "z" :[-1, -3, -.45, -5.6, 4]
})

In [4]:
data

Unnamed: 0,x,y,z
0,1,0.1,-1.0
1,2,0.2,-3.0
2,3,0.4,-0.45
3,4,0.6,-5.6
4,5,0.7,4.0


In [5]:
data.to_numpy()

array([[ 1.  ,  0.1 , -1.  ],
       [ 2.  ,  0.2 , -3.  ],
       [ 3.  ,  0.4 , -0.45],
       [ 4.  ,  0.6 , -5.6 ],
       [ 5.  ,  0.7 ,  4.  ]])

In [6]:
# working with DataFrames
df2 = pd.DataFrame(data.to_numpy(),
                  columns= ['one', 'two', 'three'])

In [7]:
df2

Unnamed: 0,one,two,three
0,1.0,0.1,-1.0
1,2.0,0.2,-3.0
2,3.0,0.4,-0.45
3,4.0,0.6,-5.6
4,5.0,0.7,4.0


In [8]:
df3 = data.copy()

In [9]:
df3['strings'] = ['a', 'b', 'c', 'd', 'e']

df3

Unnamed: 0,x,y,z,strings
0,1,0.1,-1.0,a
1,2,0.2,-3.0,b
2,3,0.4,-0.45,c
3,4,0.6,-5.6,d
4,5,0.7,4.0,e


In [10]:
df3.to_numpy()

array([[1, 0.1, -1.0, 'a'],
       [2, 0.2, -3.0, 'b'],
       [3, 0.4, -0.45, 'c'],
       [4, 0.6, -5.6, 'd'],
       [5, 0.7, 4.0, 'e']], dtype=object)

In [11]:
# to use subset of columns, loc indexing with to_numpyil

model_cols = ["x", "y"]

data.loc[:, model_cols].to_numpy()

array([[1. , 0.1],
       [2. , 0.2],
       [3. , 0.4],
       [4. , 0.6],
       [5. , 0.7]])