Skip to content

Write csv not save all lines of dataframe #3783

@Miyake-Diogo

Description

@Miyake-Diogo

Describe the bug
When I try to save dataframe as csv, only around 400K of lines are saved.. data has more than 1M of lines.

To Reproduce
My code:

use datafusion::prelude::*;
use log::{debug, info, LevelFilter, trace};
use crate::datapipeline::data_utils::*;
pub mod datapipeline;
use datafusion::logical_plan::when;

use datafusion::arrow::datatypes::DataType::{Int64,Utf8};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
  let ctx: SessionContext = SessionContext::new();
  let raw_fato_path: &str = "data/minilake/raw/fato_census/Data8277.csv";
  let stage_fato_path: &str = "data/minilake/stage/fato_census/";
  let fato_census_df = ctx.read_csv(raw_fato_path,  
                                  CsvReadOptions::new()).await?;
  
  let fato_census_df = fato_census_df.with_column("area",cast(
    col("Area"),
    Utf8))?;

  let fato_census_df = fato_census_df
    //.with_column("Area",concat_ws("-", &vec![lit("A"),col("Area")]))?
    .select(vec![
      col("Year").alias("year"),
      col("Age").alias("age"),
      col("Ethnic").alias("ethnic"),
      col("Sex").alias("sex"),
      col("Area").alias("area"),
      col("count").alias("total_count")
      ])?;
  
  // We can see the ..C values in Count column
  fato_census_df.show_limit(5).await?;
  print_schema_of_dataframe(&fato_census_df).await?;
  // Create a function to make trnasformation
  let transform_count_data = when(col("total_count")
    .eq(lit("..C")), lit(0_u32))
    .otherwise(col("total_count"))?;

  //Cast column datatype
  let fato_census_df = fato_census_df.with_column(
    "total_count",
    cast(transform_count_data, Int64))?;
  
  fato_census_df.write_csv(stage_fato_path).await?;

  Ok(())
  }

Dataset:

Age and sex by ethnic group (grouped total responses), for census usually resident population counts, 2006, 2013, and 2018 Censuses (RC, TA, SA2, DHB)
Expected behavior
See all lines saved:

image

But only this quantity are saved.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions