-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Describe the bug
When I try to save dataframe as csv, only around 400K of lines are saved.. data has more than 1M of lines.
To Reproduce
My code:
use datafusion::prelude::*;
use log::{debug, info, LevelFilter, trace};
use crate::datapipeline::data_utils::*;
pub mod datapipeline;
use datafusion::logical_plan::when;
use datafusion::arrow::datatypes::DataType::{Int64,Utf8};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let ctx: SessionContext = SessionContext::new();
let raw_fato_path: &str = "data/minilake/raw/fato_census/Data8277.csv";
let stage_fato_path: &str = "data/minilake/stage/fato_census/";
let fato_census_df = ctx.read_csv(raw_fato_path,
CsvReadOptions::new()).await?;
let fato_census_df = fato_census_df.with_column("area",cast(
col("Area"),
Utf8))?;
let fato_census_df = fato_census_df
//.with_column("Area",concat_ws("-", &vec![lit("A"),col("Area")]))?
.select(vec![
col("Year").alias("year"),
col("Age").alias("age"),
col("Ethnic").alias("ethnic"),
col("Sex").alias("sex"),
col("Area").alias("area"),
col("count").alias("total_count")
])?;
// We can see the ..C values in Count column
fato_census_df.show_limit(5).await?;
print_schema_of_dataframe(&fato_census_df).await?;
// Create a function to make trnasformation
let transform_count_data = when(col("total_count")
.eq(lit("..C")), lit(0_u32))
.otherwise(col("total_count"))?;
//Cast column datatype
let fato_census_df = fato_census_df.with_column(
"total_count",
cast(transform_count_data, Int64))?;
fato_census_df.write_csv(stage_fato_path).await?;
Ok(())
}Dataset:
Age and sex by ethnic group (grouped total responses), for census usually resident population counts, 2006, 2013, and 2018 Censuses (RC, TA, SA2, DHB)
Expected behavior
See all lines saved:
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working

